• 



J»PLIC ATION for patent 



Inventors - . 



_ wiodavsky and Elena Feinstein 
Iris Pecker, Israel Vlodavs^y 



10 



Title: 



HAVING HEP ABA* I ASEAC1 D cELLS 

OP SAME IN GENETIC ALL i 



15 



20 



filed March 1, 1999, daims prl ority ftom ^ continua tion-in- 

ftled August 31, lyy^ now abandoned, vvhicn i l997 , now, 

08,9 2 2,>70, «W Sep^er 

US .Patent Mo. 5,968,822. 



25 



• relates to a polynucleotide, 

cf » The invention rurtner 

— eSSi " gheP activity wi to — o'^— 

prot e in having hep-ase ^ In 

^ ana — es * *- — ^ ^ 



30 



uses. 



1 




2 

~~ - . — — rrrr: 

t rW (FCM) of a wide range 01 
extra cellular matrix (ECM) . ^ 

• The basic HSPG structure includes p 

invertebrate tissues (1-4). d 

c „lfate chains are covaienuy 
... several linear heparan sulfate cn 
to which several reoea ting hexuronic 

■ , disaccharide units that are substrtuted 
md o^lucosamme dtsacch (j _ 

•a, N and O-linked sulfate moieties and N 
extent with N- ana u attachment, 

„• on the involvement of ECM molecules 
4 , stud.es on 

0 ^h and differentiation reveaied a 

• neurite outgrowth and tissue v 
\c i,naioeenesis, neurue & 

In large bl ood 

HSPG are prominent components of 

— -t^— 
, 5 —ne .here ^support proltfer . 

totM act .th BCM — - ^ s I ^ _ me „ 

j with different attachment sites on y 
fibronectin, and with self . asS embly and 

t a key role for this proteoglycan m the 
suggests a key ^ ^ and 

, „f PCM components, as wei 
20 insolubility of ECM 

, nn cleavage of the heparan sulfate (Hb) 
locomotion. Cleav g play a 

• „f the subendothehal ECM ana nc 
result in degradation of the sube 



1 



r ui ~a Wne cells HS catabolism is 
decisive role in extravasation of blood-borne cells. 

served in inflammation, wound repair, diabetes, and cancer metastas.s, 
suggesting that enzymes which degrade HS play important ro.es rn 
pathoiogic processes. Heparanase activity has been described in activated 
„ system ceiis and bighiy metastatic cancer cells (6-S), bu. research 
has been handicapped by me lack of biologic tools to exp.cre potenttal 
causative roles of heparanase in disease conditions. 

< of Heparanase in Tunor CeU fusion an d 
Miosis: Circulating tumor cells arrested in the capillary beds of 
different organs must invade the endothelial cel. Hning and degrade ,.s 
underlying basement membrane (BM) in order to invade into the 
e _u,ar tissue(s) where they establish metastasis (9, .0). Metastatic 
^ cells often attach at or near the intercellular junctions between 
adjacent endothelial cells. Such attachment of the metastatic eel, - 
s followed by rupture of the junctions, retraction of the endothelial cel! 
Wers and migration through the breach in the endothe.ium toward the 
exposed underiying BM (9). Once located between endotheiial cells and the 

. j- „„„ m „„ deerade the subendothelial glycoproteins and 
BM the invading cells must aegrauc u. 

proteoglycans of the BM in order to migrate out of the vascular 
20 compartment. Severa. ce„u,ar enzymes (e.g., col.agenase IV, P-asminogen 
activator, eathepsin B, elastase, etc.) are drought to be involved m 
degradation of BM (10). Among these enzymes is an endo-p-D- 



glucuronidase (heparanase) that cleaves HS at specific intrachain sites (6, 
8, 11). Expression of a HS degrading heparanase was found to correlate 
with the metastatic potentia. of mouse lymphoma (11), fibrosarcoma and 
melanoma (8) cells. Moreover, elevated leve.s of heparanase were detected 
s in sera from metastatic tumor bearing animals and melanoma patients (8) 
and in tumor biopsies of cancer patients (12). 

The control of cell proliferation and tumor progression by the local 
nucroenvironment, focusing on the interaction of cells with the 
ex.racellu.ar matrix (ECM) produced by entered corneal and vascular 
„ endothelial cells, was investigated previous* by the present inventors. This 
cultured ECM closely resembles the subendothelium in vivo in its 
morphological appearance and molecular composition. It contains 
collages (mostly W e HI and IV, with smaller amounts of types I and V), 
pr„te„g.yca„s (most.y heparan su.fate- and dermatan sulfate- proteoglycans, 
15 with smaller amounts of chondroitin sulfate proteoglycans), laminin, 
fibronectin, entactin and elastin (13, 14). The ability of cells to degrade HS 
in the cultured ECM was studied by allowing cells to interact with a 
metabolically sulfate labeled ECM, followed by gel filtration (Sepharose 
6B) analysis of degradation products released into the culture medium (11). 
» While intact HSPG are eluted next to the void volume of the column 
(Kav<0.2, Mr - 0.5x106), labeled degradation fragments of HS side chains 



are eluted more toward the V t of the 



(11). 



column (0.5<kav<0.8, Mr =5-7x103) 



Th e heparans inhibitory effect of various non-anticoagulan, 
species of heparin ma, might be of potential use in preventing extravasation 
5 of blood-bome ceus was also investigated by the present inventors, 
hrhibition of heparanase was best achieved by heparin species containing ,6 
sugar units or more and having sulfate groups at both the N and O positions. 
WMe O-desulfation abolished the heparanase inhibiting effect of hepann, 
O-sulfated, N-acetylated heparin retained a high inhibitory activity, 
10 provided that the N-substituted moiecu.es had a molecular size of about 
4 000 daltons or more (7). Treatment of experimental anima,s w.th 
heparanase inhibitors (e.g., non-anticoagulan, species of heparin) markedly 
reduced (>90%) the incidence of lung metastases induced by B16 
me ,anoma, Lewis lung carcinoma and mammary adenocarcinoma celis (7, 
, 5 8 16) Heparin fractions with high and low affinity to anti-thrombin III 
exhibited a comparable high anti-metastatic activity, indicating that the 
heparanase inhibiting activity of heparin, rather than «. anticoagulant 
activity, plays a role in the anti-metastatic properties of the polysacchar.de 



(7). 
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Heparanase ac, M » in ,ne urine of caneer paUenis: In an attempt 
to further elucidate the involvement of heparanase in tumor progression 
and its relevance to human cancer, urine samples for heparanase acfv.ty 



were screened 06a). Heparanase activity was detected in the urine of some, 
bn, not aU, cancer patients. High .evc.s of heparanase activity were 
determined in the urine of patients with an aggressive metastatic disease and 
there was no detectable activity in the urine of heaithy donors. 

Heparanase activity was also found in the urine of 20% of norma, 
and microalbuminuric insulin dependent diabetes mellitus (IDDM) patients, 
most likely due to diabetic nephropathy, the most important single disorder 

leading to renal failure in adults. 

PossiUe involvement of Heparanase in tumor anagenesis: 
. Fibroblast growth factors are a family of structurally re.ated polypeptides 
characterized by high affinity to heparin (.7). They are highly mitogemc 
for vascular endothelial cells and are among the most potent inducers of 
neovascu.ariza.ion (17, ,8). Basic fibroblast grow* factor (bFGF) has been 
extracted from the subendothelia. ECM produced * vUro (19) and from 

, nn\ simoestinE that ECM may serve as 
, s basement membranes of the cornea (20), suggesting in 

. reservoir for bFGF. Immunohistoehemical staining revealed the 
.ocahzation of bFGF in basement membranes of diverse tissues and blood 
vessels (21). Despite the ubiquitous presence of bFGF in normal tissues, 
endothelial cel. proliferation in these tissues is usually very low, suggesting 
2 „ that bFGF is somehow sequestered from its site of action. Studies on the 

ECM and can be released in an active form by HS degrading enzymes (15, 



20 22 ) It was demonstrated that heparanase activity expressed by platelets, 
m .t ceils, neutrophils, and lymphoma ce«s is involved in release of acttve 
bFG F from ECM and basement membranes (23), sug g estin g that 

b u, may also eiicit an indiree, neovascular response. Tfcese resuits s„ gg est 

oth er heparin-binding grow* P— factors (24, 25). Dispiacemen, of 
bFGF from its stora g e within basement membranes and ECM may therefore 
provi de a novel meehanism for indnction of neovascularization in norma, 
io and pathological situations. 

of bFGF to high affinity ce.l surface reeeptors and in bFGF ceil si g „a,in g 

. • f ttc r enuired for optimal effect was similar 
(26, 27). Moreover, the size of HS required 10 f 

„~ nz\ Similar results were 
,o that of HS fragments released by heparanase (28). Sum. 

, ce ii s growth factor (VEGF) (29), 

15 obtained with vascular endothelial cells gro 

jesting the operation of a dual receptor mechanism invo,vin g HS in cel, 
interaction with heparin-bindin g growth factors. I. is therefore proposed 
« Miction of endothe.ial cell g rowth factors in ECM prevents thetr 
sys temic action on the vascular endotheHum, thus maintaining a very .ow 
20 rate of endothelial cells turnover and vessel .rowtn. On the other hand, 
release of bFGF from storage in ECM as a comp.ex with HS fragment, may 
elicit located endothelial cell prohferation and neovascularization ,n 
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processes such as wound healing, inflammation and tumor deveiopmen, (24, 
25). 

, , «.„ f,« rplk of the immune system: 

Expression of heparanase by cells oj 

Hepatanase activity corre,ates with the ability of activated cells of the 
5 immune system to .eave me oration and eiicit both inflammatory and 
autoimmune responses. Interaction of p.ate.ets, granuiocytes, T and B 
fymphocytes, macrophages and mast cells with the subendothelia, ECM is 
associated with degradation of HS by a specific heparanase activity (6). 
The enzyme is released from intracellular compartments (e.g., lysosomes, 
,„ specific granules, etc.) in response to various activation signals (e.g., 
thrombin, calcium ionophore, immune complexes, antigens, mitogens, etc.), 
suggesting its regulated involvement in inflammation and cellular immunity. 

Some of the observations regarding ike heparanase enzyme .ere 
reviewed in reference No. 6 and are listed hereinbelow: 

Firs, a proteose activity (plasminogen activator) and heparanase 
participate synergistic* in sciential degradation of the ECM HSPG by 
inflammatory leukocytes and malignant cells. 

Second, a large proportion of the platelet heparanase exists in a latent 
form probably as a complex with chondroitin sulfate. The iatent enzyme is 
20 activated by tumor cell-derived factors) and may then facihtate cell 
invasion through the vascular endothelium in the process of tumor 



metastasis. 



from a-eranules is induced 
Third release of the platelet heparans from gr 

k- ^ hut not in response to platelet 
„iont fie thrombin), hut nox v 
by a strong stimulant (.i.e., 

activation on ECM. nreferentia lly and readily 

t a threshold activation and upon incubation of the 
released in response to a threshold 



cells on ECM. 



vu with ECM inhibited release of noxious 
Fifth contact of neutrophils with ECM 

, and oxyge n radicals, but not of enzymes 
enTvmes (proteases, lysozyme) and oxyge 

enzymes \y nrotective role 

kinase) which may enable diapedests. Th,s protec 
(heparanase, gelaunase) « the c el,s were stimulated 

,„ of ^ subendothelia, ECM was observed when the 

with so,* factors but no, wiVnpha^osables,—. 

exposure of T cell lines to specific antigens. 

. , fC on A LPS) induce synthesis and secretin of 
Seventh, mitogens (Con A, 

bv norma. T and B lymphocytes maintained * v*.. 
15 byn ° m . by immunizatta n with antigen * 

lymphocyte heparanase ts also mduced by 



vivo. 



Eighth, heparanase activity is expressed by pre-B lymphomas and B- 

y a Kv activated macrophages 

o^tivitv is expressed by acuv<w=« 
20 Ninth, heparanase activity is P 

- , prM but there was little or no release of the 
during incubation with ECM, but 

m Similar results were obtained with 
enzyme into the incubation medium. Similar 
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• 11, induced to differentiate to mature 
human myeloid leukemia cells induced 

macrophages. 

Ten* T-ceU mediated de.ayed type hypersensitive and 
^mental — - suppressed by low - - — 

inhibiting non-anticoagulant species of heparin (30). 

.• -„ .v mP «ed bv platelets, neutrophils and 
Eleventh, heparanase activity expressed by p 

hFGF from ECM and basement 
metastatic tumor cells releases active bFGF 

• cpM mav elicit a localized 
membranes. Release of bFGF from storage in ECM may 

in nrocesses such as wound healing, inflammation and 
neovascular response in processes su 

o tumor development. 

the breakdown products of the ECM generated by 
Twelfth, among the breaKaowu v 

. is a tri-sulfated disaccharide that can inhibit T-ce.l mediated 

heparanase is a tn suiww 

tafl ammation v,V„ ^ « « — <~ — - " " 
effe ct of the disaccharide on the production of bio,o g ical,y active V*. by 

15 activated T cells in vitro (31). 

0ther potenUai tHerapeutic applications: Apart from its 

ell metastasis inflammation and autoimmunity, 
involvement in tumor cell metastasis, 

. flnnlied to modulate: bioavailability ot 
mammalian heparanase may be applied 

*• wth factors (15); cellular responses to heparin-binding 
heparin-binding growth factors 

t o bFGF VEGF) and cytokines (1L-8) (31a, 29); cell 
20 growth factors (e.g., bFW, vcu , 

• ♦ • mv cellular susceptibility to certain 
interaction with plasma lipoproteins (32), cellular 

• i a nrotozoa infections (33, 33a, 33b); and 
viral and some bacterial and protozoa in 
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disintegration of amyloid plaques (34). Heparans may thus prove useful 
for conditions such as wound healing, angiogenesis, restenosis, 
atherosclerosis, inflammation, . neurodegenerative diseases and viral 
infections. Mammalian heparanase can be used to neutralize plasma 
5 heparin, as a potential replacement of protamine. Anti-heparanase 
antibodies may be applied for immunodetection and diagnosis of 
micrometastases, autoimmune lesions and renal failure in biopsy specimens, 
plasma samples, and body fluids. Common use in basic research is 
expected. 

10 The identification of the hpa gene encoding for heparanase enzyme 

will enable the production of a recombinant enzyme in heterologous 
expression systems. Availability of the recombinant protein will pave the 
way for solving the protein structure function relationship and will provide a 
tool for developing new inhibitors. 
15 Viral Infection: The presence of heparan sulfate on cell surfaces 

have been shown to be the principal requirement for the binding of Herpes 
Simplex (33) and Dengue (33a) viruses to cells and for subsequent infection 
of the cells. Removal of the cell surface heparan sulfate by heparanase may 
therefore abolish virus infection. In fact, treatment of cells with bacterial 
20 heparitinase (degrading heparan sulfate) or heparinase (degrading heparan) 
reduced the binding of two related animal herpes viruses to cells and 
rendered the cells at least partially resistant to virus infection (33). There 
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are some 



HIV infection (33 b). 

„ diseases- Heparan sulfate proteoglycans were 
Neurodegenerative diseases, n f 

o— «-* *- - — w Heparanffie ; ay 

mid olaaues which are also thought to play a role m 
disintegrate these amyloid plaques wn 

the pathogenesis of Alzheimer's disease. 

gnosis and xerosis: Proliferation of arterial smooth 
fflUS cle cells (SMCs) in response to endothelial i.ury and accumulation of 

win events in the pathogenesis ot 
o cholesterol rich lipoproteins are basic events 

• nV > Apart from its involvement m SMC 
atherosclerosis and restenosis (35). Apart 

• re low affinity receptors for heparin-binding growth factors), 
proliferation (i.e., low amniiy v 

in hv promoting accumulation of apoB and 
expected to be highly atherogenic by promoting 

apoE ri c h Ce. I* VU*. c hy — , 

L^ — ^ece^-,^, — O.SMC , 

20 heparan ase U — «*~ . SMC p— - „P. 



atherosclerosis. 
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Gene therapy: 

The ultimate goal in the management of inherited as well as acquired 
diseases is a rational therapy with the aim to eliminate the underlying 
biochemical defects associated with the disease rather then symptomatic 
treatment. Gene therapy is a promising candidate to meet these objectives, 
initially it was devetoped for treatment of genetic disorders, however, the 
consensus view today is ma, it offers the prospect of providing therapy for a 
variety of acquired diseases, including cancer, viral infections, vascular 
diseases and neurodegenerative disorders. 

The gene-based therapeutic can act either intracellular*, affecting 
„„,y the cells to which it is delivered, or extracellular*, using the recipient 
cells as local endogenous factories for the therapeutic produces). The 
application of gene therapy may follow any of the Mowing strategies: (i) 
prophylactic gene therapy, such as using gene transfer to protect cells 
5 against viral infection; (ii) cytotoxic gene therapy, such as cancer therapy, 
where genes encode cytotoxic products to render the target cells vulnerable 
to attack by the normal immune response; (IB) biochemical correction, 
primarily for the treatment of single gene defects, where a normal copy of 
the gene is added to the affected or other cells. 

To allow efficient transfer of the therapeutic genes, a variety of gene 
delivery techniques have been developed based on viral and non-viral 
vector systems. The most widely used and most efficient systems for 



14 

delivering genetic material into target ce.ls are viral vectors. So far, 329 
clinical studies (phase MI and II) with over 2,500 patients have been 

initiated Worldwide since 1989 (50). 

The approach of gene addition pose serious barriers. The expression 
of many genes is tightly regulated and context dependent, so achieving the 
correct balance and taction of expression is challenging. The gene itself is 
often quite large, containing many exons and introns. The delivery vector is 
usually a virus, which can infect with a high efficiency but may, on the 
other hand, induce immunological response and consequently decreases 
effectiveness, especially upon secondary administration. Most of the 
current expression vector-based gene therapy pro.oco.s fail to achieve 
clinically significant transgene expression required for treating genetic 
diseases. Apparently, it is difficult to deliver enough virus to the right cell 
type to elicit an effective and therapeutic effect (5 1) 

Homologous recombination, which was initially considered to be of 
Kmited use for gene therapy because of its low frequency in mammalian 
celis, has recently emerged as a potential strategy for developing gene 
therapy. Different approaches have been used to study homologous 
recombination in mammalian ce.ls; some involve DNA repair mechanisms. 
20 These studies aimed a, either gene disruption or gene correction and include 
KNA/DNA chimeric oligonucleotides, small or large homologous DNA 
fragments, or adeno-associated viral vectors. Most of these studies show a 
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reasonable frequency of homologous recombination, which warrants further 
in vivo testing (52). Homologous recombination-based gene therapy has the 
potential to develop into a powerful therapeutic modality for genetic 
diseases. It can offer permanent expression and normal regulation of 
5 corrected genes in appropriate cells or organs and probably can be used for 
treating dominantly inherited diseases such as polycystic kidney disease. 
Genomic sequences function in regulation of gene expression: 
The efficient expression of therapeutic genes in target cells or tissues 
is an important component of efficient and safe gene therapy. The 
,„ expression of genes is driven by the promoter region upstream of the coding 
1 sequence, although regulation of expression may be supplemented by 
farther upstream or downstream DNA sequences or DNA in the introns of 
the gene. Since this important information is embedded in the DNA, the 
description of gene structure is crucial to the analysis of gene regulation. 
„ Characterization of cell specific or tissue specific promoters, as well as 
other tissue specific regulatory elements enables the use of such sequences 
to direct efficient cell specific, or developmental stage specific gene 
expression. This information provides the basis for targeting individual 
genes and for control of their expression by exogenous agents, such as 
2Q togs. Identification of transcription factors and other regulatory proteins 
required for proper gene expression will point a. new potential targets for 
modulating gene expression, when so desired or required. 
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Efficient expression of many mammalian genes depends on the 
presence of at least one intron. The expression of mouse thymidylate 
synthase (TS) gene, for example, is greatly influenced by intron sequences. 
The addition of almost any of the introns from the mouse TS gene to an 
, intronless TS minigcne leads to a large increase in expression (42). The 
involvement of intron 1 in the regulation of expression was demonstrated 
for many other gene, In human factor IX (hFIX), intron 1 is able to 
increase the expression level about 3 fold mare as compared to that of the 
hFIX cDNA (43). The expression enhancing activity of intron 1 is due to 
,„ efficient functional splicing sequences, present in the precursor mRNA. By 
being efficiently assembled into spliceosome complexes, transcripts with 
splicing sequences may be better protected in the nuc.eus from random 
degradations, than those without such sequences (44). 

A forward-inserted intronl-carrying hFIX expression cassette 
15 suggested to be useful for directed gene transfer, while for retroviral- 
mediated gene transfer system, reversely-inserted intron 1-cartying hFIX 
expression cassette was considered (43). 

A highly conserved cis-acting sequence element was identified in the 
first intron of the mouse and rat e-Ha-ras, and in the first exon of Ha- and 
2Q Ki-ras genes of human, mouse and rat. This cis-acting regulatory sequence 
confers strong transcription enhancer activity that is differentially 
modulated by steroid hormones in metastatic and nonmetastatic 
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subpopulations. Perturbations in the reguiatory activities of such cis-acti„g 
sequences may play an important role in governing oncogenic potency of 
Ha-ras through transcriptional control mechanisms (45). 

Intron sequences affect tissue specific, as well as inducible gene 
5 expression. A 182 bp intron 1 DNA segment of the mouse Col2al gene 
contains the necessary information to confer high-level, temporally correct, 
chondrocyte expression on a reporter gene in intact mouse embryos, while 
Col2al promoter sequences are dispensable for chondrocyte expression 
(46). In CollAl gene the intron plays little or no role in constitutive 
,„ expression of collagen in the skin, and in cultured cells derived from the 
skin, however, in the lungs of young mice, intron deletion results in 
decrease of expression to less than 50 % (47). 

A classical enhancer activity was shown in the 2 kb intron fragment 
in bovine beta-casein gene. The enhancer activity was largely dependent on 
15 the lactogenic hormones, especially prolactin. It was suggested that several 
elements in the intron-1 of the bovine beta-casein gene cooperatively 
interact not only with each other bu, also with its promoter for hormonal 
induction (48). 

Identification and characterization of regulatory elements in genomic 
20 „on-codin g sequences, such as introns, provides a tool for designing and 
constructing novel vectors for tissue specific, hormone regulated or any 
other defined expression P a«ern, for gene therapy. Such an expression 
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„ , „ e d utilizing regulatory elements from the human 
cassette was developed, utilizing 8 

• , a ■ „ v genomic sequences and one ot its 
cytokeratin 18 (K18) gene, including 5 genomic 

human cystic Hhrosis — ane eondue.ce relator (CFTR) gene, 
5 in cultured lung epithelial cells (49). 

Alternative splicing: 

r • of ore mRNA is a powerful and versatile 
Alternative splicing of pre miuN 

. • that can effect quantitative control of gene expression 
regulatory mechanism that can eneci q 

f nroteins It contributes to major 
and functional diversification of proteins. 

, ■ ■ and also to a fine-tuning of gene function. Genetic 
10 developmental decisions and also to 

h~ have identified cis-acting regulatory elements 
and biochemical approaches have identi 

* ^acting factors that con.ro, alternative splicing of specific mKHA, 

• , , These include cell surface molecules such as 
proteins from a single gene. These inciu 

, - ,nch as VEGF and enzymes. Products of 
CD44 receptors, cytokines such as vuw 

• t. differ in their expression pattern, substrate 
alternatively spliced transcripts differ in 

specificity and other biological parameters. 

FGF receptor KNA undergoes alternative splicing which results 

20 specifici tie, The alternative splicing is regulated in a cell specific manner 



(53). 



t 
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Alternative spliced mRNAs are often co.da.ed with malignancy. 
An increase in specific spHce variant of tyrosinase was identified in murine 
melanomas (54). Multiple splicing variants of estrogen receptor are present 

CD44 has various isoform, some are 



in in 



individual human breast tumors. 



5 characteristic of malignant tissues . 

identification of tumor specific alternative spHce variants provide 
n ew too, for cancer diagnostics. CD44 variants have been used for 
detection of malignancy in urine samples from patients with urothelial 

„ RT PCR (55) CD44 exon 6 was suggested as 
cancer by competitive Rl-FCK imj. 

10 prognostic indicator of metastasis in breast cancer (56). 

Different enzymes or polypeptides generated by alternative splicing 
nray have different function or catalytic specificity. The identification and 
Caracteri.af.on of the enzyme forms, which are involved in pathological 
processes, is crucial for .he design of appropriate and efficient drugs. 
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Modulation of gene expression - Antisense technology: 



An antisense 



oligonucleotide (e.g., ™ tisense 



oUgodeoxyribonue.eo.ide) may bind its targe, nucleic acid either by 
Watson-Crick base pairing or Hoogs.ee„ and an.i-Hoogs.een base pairing 
(64 ) According to me Watson-Crick base pairing, heterocyclic bases of the 
2 „ antisense oligonuc.eotide form hydrogen bonds with the heterocyclic bases 
of .arge. single-stranded nucleic acids (RNA or single-stranded DNA), 
whereas according to the Hoogsteen base pairing, the heterocyclic bases of 
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the target nucleic acid are ******* ^ — * " 
accomodated in the major groove of the B-form DNA duplex by 
Hoogsteen and anti-Ho.gs.een base pairing to form a tripie heiix structure. 
According to both the Watson-Crick and the Hoogsteen base painng 
5 raod els, antisense oligonucleotides have the potential to regulate gene 
expression and to disrupt the essentia, functions of the nucleic acids in ceils. 
Therefore, antisense oligonucleotides have possible uses in modulating a 
wide range of diseases in which gene expression is altered. 

Since the development of effective methods for chemically 
10 synthesizing oligonucleotides, these molecules have been extensively used 
in biochemistry and biological research and have the potential use m 
medicine, since careftmy devised oligonucleotides can be used to control 

■ „ hv regulating levels of transcription, transcripts and/or 

gene expression by regulating 

translation. 

O.igodeoxyribonuc.eotides as long as 100 base pairs (bp) are 
routinely synthesized by solid phase methods using commercially available, 
fully automated synthesis machines. The chemical synthesis of 
oligoribonuc.eo.ides, however, is far less routine. OHgoribo.ucleo.ides are 
also much .ess s.ab.e than o.igodeoxyribonuc.eotides, a fact which has 
20 contributed to the more prevalent use of oligodeoxyribonucleotides in 
medical and biological research, directed at, for example, the relation of 
transcription or translation levels. 



21 

Oene expression invoives few distinct and «U plated steps. Tne 
„ ^ step of gene session invoWes transcription o f a messenger 
^ (mRNA) W hic„ is an RNA sequence comp— y to the antisense 

, • nther words identical in sequence to the DNA 
(i e .) DNA strand, or, in other woras, 

. the eene In eukaryotes, transcription 
sense (i.e., +) strand, composing the gene. 

occurs in the cell nucleus. 

* ^vnression involves translation oi a 
The second major step of gene expression 

structural proteins, secreted proteins, gene 
protein (e.g., enzymes, structural p 



mRNA 



direct the synincM=> ^ ±~- 

nation of —ion requires specific recognition of a promoter 

This recognition is 

RNA-synthesizing enzyme -- RNA polymerase. 
u preceded >y sequence-specific Mnding of one or more transcription factors 

, mav trans upregula.e transcription via eis elements 
promoter sequence may trans up 5 

pr omoter, « -se Ending promts me action of KKA poiymerase, „ 

20 known as repressors. 

Tto are a.so evidence that in some cases gene expresston , 
doW nregu,ated oy endogenous antisense RXA repressors tna, oind a 
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complementary mRNA transcript and thereby prevent its translation into a 



functional protein. 



10 



Thus, gene expression is typically upregulated by transcription 
factors and enhancers and downregulated by repressors. 

However, in many disease situation gene expression is impaired. In 
many cases, such as different types of cancer, for various reasons the 
expression of a specific endogenous or exogenous (e.g., of a pathogen such 
as a virus) gene is upregulated. Furthermore, in infectious diseases caused 
by pathogens such as parasites, bacteria or viruses, the disease progression 
depends on expression of the pathogen genes, this phenomenon may also be 
considered as far as the patient is concerned as upregulation of exogenous 



genes, 



Most c 
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onventional drugs function by interaction with and modulation 
of one or more targeted endogenous or exogenous proteins, e.g., enzymes. 
Such drugs, however, typically are not specific for targeted proteins but 
interact with other proteins as well. Thus, a relatively large dose of drug 
must be used to effectively modulate a targeted protein. 

Typical daily doses of drugs are from 10-5 . 10 -1 millimoles per 
kilogram of body weight or 10-3 - 10 millimoles for a 100 kilogram person. 
If this modulation instead could be effected by interaction with and 
inactivation of mRNA, a dramatic reduction in the necessary amount of 
drug could likely be achieved, along with a corresponding reduction in side 
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effect, Further reductions could be effected if such interaction could be 
re „dered site-specific. Given that a functioning gene continual* produces 

o^/antapeous if gene transcription 
roRNA, it would thus be even more advantageous g 

could be arrested in its entirety. 

Giv en these facts, it wouid be advantageous if gene expression cou.d 

be arrested or downmodulated at the transcription level. 

Tbe ability of chemically synthesizing oligonucleotides and analogs 

• ~a Alienee offers means for 
thereof having a selected predetermined sequence 

™ Three types of gene expression 
downmodulating gene expression. Three typ 

10 modulation strategies may be considered. 

~ ™- cptiqp olieonucleotides or 
At the transcription level, ant.sense or sense ohg 

•„ niMA hv strand displacement or the 
analogs that bind to the genomtc DNA by 

formation of a triple heHx, may prevent transcription (64). 

A , the transcript leve!, antisense oligonucleotides or analogs that 

by intracellular BKase H (65). .n this case, by hybridizing to the targeted 
^ the oligonucleotides or oligonucleotide analogs provide a duplex 
hybrid recognized and destroyed by the R*ase H enzyme. Alternatively, 

, j + ;„torfprence with correct splicing (66). 
such hybrid formation may lead to interference wit 

u ~f th P target mRNA intact transcripts 
20 As a result, in both cases, the number of the target mKiN 

ready for translation is reduced or eliminated. 
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At the translation level, antisense oligonucleotides or analogs that 
bind targe, mRNA molecu.es prevent, by steric hindrance, binding of 
essential translation factors (ribosomes), to the targe. mRNA, a 
phenomenon known in the art as hybridization arrest, disabling the 

translation of such mRNAs (67). 

Thus, antisense sequences, which as described hereinabove may 
arrest the expression of any endogenous and/or exogenous gene depending 
on their specific sequence, attracted much attention by scientists and 
pharmacologists who were devoted at developing the antisense approach 

into a new pharmacological tool (68). 

For example, several antisense oligonucleotides have been shown to 
arrest hematopoietic cel. proliferation (69), growth (70), entry into the S 
phase of the cell cycle (71), reduced survival (72) and prevent receptor 
mediated responses (73). For use of antisense oligonucleotides as antiviral 
agents the reader is referred to reference 74. 

For efficient In vivo inhibition of gene expression using antisense 
ohgonucleotides or analogs, the oligonucleotides or analogs must fulfill the 
following requirements (i) sufficient specificity in binding to the targe, 
sequence; (ii) solubility in water; (iii) stability against intra- and 
» extracellular nucleases; (iv) capability of penetration through the cell 
membrane; and (v) when used to treat an organism, low toxicity. 
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Unmodified oUgonuc.eo.ides are impractical for use as antisense 
seo.uer.ces since .hey have short * vM, ha,f-lives, during which tirey are 
degraded rapidiy by nucieases. Furthermore, .hey are difficu,. .o prepare in 
m0 re than milligram entities. In addition, such oiigonuc.eo.ides are poor 
cell membrane penetraters (75). 

recrements, o.igonuc.eotide ana.o g s need .o be devised in a sui,b,e 
manner. Therefore, an ex.ensive search for modified o,igonuc.eo.ides has 

been initiated. 

F0 r examp.e, prob.ems arising in connect wi.h doub.e-s.randed 
DNA (dsDNA) recognition through trip.e he.ix formation have been 

of po.ypurine on one strand is recognized, and by "swi.ching back", a 
h omopurine seouence on *. „.her strand can be recognized. A,o, good 
15 h e.ix formation has been obutined by using artificia, bases, .hereby 
improving binding conditions witir regard .o ionic stiengti, and pH. 

pene.ra.ion, a ,ar g e number of variations in po.ynuc.eo.ide backbones have 
been done, nevertheless with little success. 

Oligonucleotides can be modified eirher in .he base, .he sugar or tire 
phosphate moiety. These modifications inc.ude, for examp.e, the use of 



methylphosphonates, 



monothiophosphates, 



dithiophosphates, 
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phosphoramidates, phosphate esters, bridged phosphorothioates, bridged 
phosphoramidates, bridged methy.enephosphona.es, dephospho 
internucleotide analogs with . siloxane bridges, carbonate bridges, 
earboxymethyl ester bridges, carbonate bridges, carboxymethyl ester 
5 bridges, acetamide bridges, carbamate bridges, thioether bridges, sulfoxy 
bridges, sulfono bridges, various "plastic" DNAs, a-a„omeric bridges and 
borane derivatives. For further details the reader is referred to reference 76. 

International patent application WO 89/12060 discloses various 
building blocks for synthesizing oligonucleotide analogs, as well as 
,„ oligonucleotide analogs formed by joining such building blocks in a defined 
sequence. The building blocks may be either "rigid" (i.e., containing a ring 
structure) or "flexible" (i.e., lacking a ring structure). In both cases, the 
building blocks contain a hydroxy group and a mercapto group, through 
which the building blocks are said to join to form oligonucleotide analogs. 
, 5 The linking moiety in the oligonucleotide analogs is selected from the group 
consisting of sulfide (-S-), sulfoxide (-SO-), and su.fone (-S0 2 -). However, 
the application provides no data supporting the specific binding of an 
oligonucleotide analog to a target oligonucleotide. 

international patent application WO 92/20702 describe an acyclic 
» oligonucleotide which includes a peptide backbone on which any selected 
chemical nueleobases or analogs are stringed and serve as coding characters 
as they do in natural DNA or RNA. These new compounds, known as 
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peptide nucleic acids (PNAs), are not only more stable in cells than their 
natural counterparts, but also bind natural DNA and RNA 50 to 100 times 
more tightly than the natural nucleic acids cling to each other (77). PNA 
oligomers can be synthesized from the four protected monomers containing 
thymine, cytosine, adenine and guanine by Merrifield solid-phase peptide 
synthesis. In order to increase solubility in water and to prevent 
aggregation, a lysine amide group is placed at the C-terminal. 

Thus, antisense technology requires pairing of messenger RNA with 
an oligonucleotide to form a double helix that inhibits translation. The 
concept of antisense-mediated gene therapy was already introduced in 1978 
for cancer therapy. This approach was based on certain genes that are 
crucial in cell division and growth of cancer cells. Synthetic fragments of 
genetic substance DNA can achieve this goal. Such molecules bind to the 
targeted gene molecules in RNA of tumor cells, thereby inhibiting the 
translation of the genes and resulting in dysfunctional growth of these cells. 
Other mechanisms has also been proposed. These strategies have been 
used, with some success in treatment of cancers, as well as other illnesses, 
including viral and other infectious diseases. Antisense oligonucleotides 
are typically synthesized in lengths of 13-30 nucleotides. The life span of 
oligonucleotide molecules in blood is rather short. Thus, they have to be 



chemically modified to prevent destruction by ubiquitous nucleases present 
in the body. Phosphorothioates are very widely used modification in 
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antisense oligonucleotide ongoing clinical trials (57). A new generation of 
antisense molecules consist of hybrid antisense oligonucleotide with a 
central portion of synthetic DNA while four bases on each end have been 
modified with 2'O-methyl ribose to resemble RNA. In preclinical studies in 
laboratory animals, such compounds have demonstrated greater stability to 
metabolism in body tissues and an improved safety profile when compared 
with the first-generation unmodified phosphorothioate (Hybridon Inc. 
news). Dosens of other nucleotide analogs have also been tested in 

antisense technology. 

RNA oligonucleotides may also be used for antisense inhibition as 
they form a stable RNA-RNA duplex with the target, suggesting efficient 
inhibition. However, due to their low stability RNA oligonucleotides are 
typically expressed inside the cells using vectors designed for this purpose. 
This approach is favored when attempting to target a mRNA that encodes 
; an abundant and long-lived protein (57). 

Recent scientific publications have validated the efficacy of antisense 
compounds in animal models of hepatitis, cancers, coronary artery 
restenosis and other diseases. The first antisense drug was recently 
approved by the FDA. This drug Fomivirsen, developed by Isis, is indicated 
>o for local treatment of cytomegalovirus in patients with AIDS who are 
intolerant of or have a contraindication to other treatments for CMV retinitis 
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r eU„i.is (Pharmacotherapy News Network). 

several — e compos are now » Cinica, triais -« - «- 
Sta ,e, — system, c^ 

threatening dtseases w 

Q fw a disease-causing protein » 
Traditional drugs intervene after a 

v. VP r block mRNA transcription/translation and 
Antisense therapeutics, however, block mKN 

• • formed and since antisense therapeutics target 
intervene before a protein is formed, 

11 w ith fewer side 



mRNA ; 



effects than current protein-inhibiting therapy. 

option nses synthetic oiigonncieotioes capahie o f hyhri^ 

* ^ w.h oligonucleotides may 
^r»JA A triple helix is formed. Suchoiigonu 
double stranded DMA. Ampic 

• tinn of eenes within the triple helical structure, 
and, therefore, transcription of genes w 

». • the use of specific nucleic acid sequences to act 
Another approach is the use ot spe 

• n factors Since transcription factors bind specific 
as decoys for transcription factors, wn 

, - s possible to synthesize oligonucleotides that will 
20 DNA sequences it is possible y 

ith the native DNA sequences for available 
effectively compete with the nativ 
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option factors * *». This approach requires the identification of 

gene specific transcription factor (57). 

lnd irec, inhibition of gene expression was demonstrated for matrix 
m eta„oproteinase g enes (MMP-1, -3, and -9), which are associated with 
invasive potentiai of human cancer cells. E.AF is a transcription activator 
of MMP genes. Expression of E.AF antisense RKA in HSC3AS cells 
sh0W ed decrease in mRNA and protein ,eve,s of MMP-1, -3, and -9. 
Moreover, HSC3AS showed lower invasive potential in vitro and * v,Vo. 
Tnese results imp,y that transfection of antisense inhibits tumor invasion by 
down-regulating MMP genes (58). 
Ribozymes: 

Ribozymes are being increasing* used for the sequence-specific 
inhibition of gene expression by the cleavage of mRNAs encoding proteins 
of interest. The possibi.ity of designing ribozymes to deave any spectfic 

th erapeutic applications. In the therapeutics area, ribozymes have been 
expioited to target viral KNAs in infectious diseases, dominant oncogenes 
in cancers and specific somatic mutations in genetic disorders. Mo, 

20 already in Phase 1 trials (62). More recently, ribozymes have been used for 
ttansgenic anima. research, gene target validation and pathway ducidauon. 
Several ribozymes are in various stages of clinical tria,s. AHOIOZYME 
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was the first chemically synthesized ribozyme to be studied in human 
clinical trials. ANGIOZYME specifically inhibits formation of the VEGF-r 
(Vascular Endothelial Growth Factor receptor), a key component in the 
angiogenesis pathway. Ribozyme Pharmaceuticals, Inc., as well as other 
firms have demonstrated the importance of anti-angiogenesis therapeutics in 
animal models. HEPTAZYME, a ribozyme designed to selectively destroy 
Hepatitis C Virus (HCV) RNA, was found effective in decreasing Hepatitis 
C viral RNA in cell culture assays (Ribozyme Pharmaceuticals, 
Incorporated - WEB home page). 

Gene disruption in animal models: 

The emergence of gene inactivation by homologous recombination 
methodology in embryonic stem cells has revolutionized the field of mouse 
genetics. The availability of a rapidly growing number of mouse null 
mutants has represented an invaluable source of knowledge on mammalian 
5 development, cellular biology and physiology, and has provided many 
mode.s for human inherited diseases. Animal models are required for an 
effective drug delivery development program and evaluation of gene 
therapy approach. The improvement of the original knockout strategy, as 
well as exploitation of exogenous enzymatic systems that are active in the 
„ recombination process, has been considerably extended the range of genetic 
manipulations that can be produced. Additional methods have been 
developed to provide versatile research tools: Double replacement method, 
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sequential gene targeting, conditional cel. type specific gene targeting, 
single copy integration method, inducible gene targeting, gene disruption by 
viral delivery, replacing one gene with another, the so called knock-in 
method and the induction of specific balanced chromosomal translocation. 
5 It is now possible to introduce a point mutation as a unique change in the 
entire genome, therefore allowing very fine dissection of gene function in 
vivo. Furthermore, the advent of methods allowing conditional gene 
targeting opens the way for analysis of consequence of a particular mutation 
in a defined organ and a, a specific time during the life of the experimenta. 

10 animal (59). 

DNA vaccination: 

Observations in the early 1990s that plasmid DNA could directly 
transfect animal cells in vivo sparked exploration of the use of DNA 
piasmids to induce immune response by direct injection into animal of DNA 
„ encoding antigenic protein. When a DNA vaccine plasmid enters the 
eukaryotic cell, the protein it encodes is transcribed and translated within 
me cell. In the case of pathogens, these proteins are presented to the 
immune system in their native form, mimicking the presentation of antigens 
during a natural infection. DNA vaccination is particularly useful for the 
M induction of T eel! activation. It was applied for viral and bacteria, 
infectious diseases, as we.l as for allergy and for cancer. The central 
hypothesis behind active specific immunotherapy for cancer is that tumor 
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W DNA vaccine against tumor was carcino-embrionic antigen (CEA). 

DNA vaccinated animals expressed —protection and immunotherapy 
of human CEA-expressing syngeneic mouse colon and breast carcinoma 
(61 , In a mouse model of neuroblastoma, DNA immunization with HuD 

*h inhibition with no neurological disease (60). 
resulted in tumor growth inhibition wnn 

protein en 75 tyrosinase-related protein- 1, 
Immunity to the brown locus protein, gp V 

+-„ n * a A in a svneeneic mouse model, 
associated with melanoma, was mvesttgated m a synge 

Priming with human gP 75 DNA broke toierance to mouse g P 75. Immuni V 
against mouse gP 75 provided significant tumor protection (60). 
Gfycosyl hydrolases: 

Giycosy! hydrolases are a widespread group of enzymes mat 
hydros the o-g.ycosidic bond between two or more carbohydrates or 
„ a carbohydrate and a noncarbohydrate moiety. The enzymattc 
5 hydrops of giycosidic bond occurs by using major one or two mechan.sms 
,eading to overall retention or inversion of the anomeric configuration. In 
hoth mechanisms catalysis involves two residues: a proton donor and a 
uucieophiie. Glycosy, hydrops have been c.assified into 58 famil.es 
bas ed on amino acid scarifies. The glycosy. hydropses from fanuhes 1. 
20 2 5 10, 17, 30, 35, 39 and 42 ac, on a large va^ety of substrates, however, 
.hey a,, hydrolyze the giycosidic bond in a general acid cata,y S1 s 
mechanism, with retention of the anomeric configuration. The mechanism 
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involves two glutamic acid residues, which are the proton donors and the 
nucleophile, with an aspargine always preceding the proton donor. 
Analyses of a set of known 3D structures from this group revealed that their 
catalytic domains, despite the low level of sequence identity, adopt a similar 
(a/p) 8 fold with the proton donor and the nucleophile located at the C- 
terminal ends of strands (34 and |J7, respectively. Mutations in the 
functional conserved amino acids of lysosomal glycosyl hydrolases were 
identified in lysosomal storage diseases. 

Lysosomal glycosyl hydrolases including ^-glucuronidase, p- 
manosidase, P -glucocerebrosidase, P -galactosidase and a-L iduronidase, are 
all exo-glycosyl hydrolases, belong to the GH-A clan and share a similar 
catalytic site. However, many endo-glucanases from various organisms, 
such as bacterial and fungal xylenases and cellulases share this catalytic 



domain. 
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Genomic sequence ofhpa gene and its implications: 

It is well established that heparanase activity is correlated with 
cancer metastasis. This correlation was demonstrated at the level of 
enzymatic activity as well as the levels of protein and hpa cDNA expression 
in highly metastatic cancer cells as compared with non-metastatic cells. As 
such, inhibition of heparanase activity is desirable, and has been attempted 
by several means. The genomic region, encoding the hpa gene and the 
surrounding, provides a new powerful tool for regulation of heparanase 
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activity a, the .eve. of gene expression. Regulatory sequences may reside in 
noncoding regions both upstream and downstream the transcribed region as 
w cl, as in intron sequences. A DNA sequence upstream of the transcription 
start site contains the promoter region and po.entia. regulatory e.ements. 
Regulatory factors, which interact with .he promoter region may be 
identified and be used as potentia. drugs for inhibition of cancer, metastasis 
*„„ inflammation. The promoter region can be used to screen for inhibitors 
of heparanase gene expression. Furthermore, the kpa promoter can be used 
t0 direct ce,. specific, particularly cancer cell specific, expression of foreign 
ge „es, such as cytotoxic or apoptotic genes, in order to specifically destroy 
cancer cells. 

Cancer and ye. unknown related genetic disorders may involve 
rearrangements and mutations in the heparanase gene, either in coding or 
non-coding regions. Such mutations may affect expression level or 
5 enzymatic activity. The genomic sequence of H P a enables the ampliation 
of specifie genomic DNA fragments, identification and diagnosis of 



mutations, 



There is thus a widely recognized need for, and it wou,d be highly 
20 advantageous to have genomic, cDNA and composite polynucleotides 
encoding a polypeptide having heparanase activity, vectors including same, 
genetically modified cells expressing heparanase and a recombinant protein 
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having heparanase activity, as well as antisense ohgonucleotides, — s 
m d r— s which can be used for down relation heparanase 
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SUMMARMFTmjNVEinTON 

Coning of the human H P a gene which encodes heparanase, and 
expression of recombinant heparanase by transfected host cells is reported 
herein, as weU as downregulation of heparanase activity by antisense 

technology. 

A purified preparation of heparanase iso.ated front human hepatoma 
cell s was subjected to tryptic digestion and microsequencing. The 
YGPDVGQPR (SEQ ID NO:8) sequence revealed was used to screen EST 
databases for homology to the corresponding bade translated DNA 
sequence. Two closely related EST sequences were identified and were 
.hereafter found to be identica,. Both clones contained an insert of 1020 bp 

a- „ fi-^me of 973 bp followed by a 27 bp of 3' 
which included an open reading frame ot 9/i op 

j fl Pnlv A tail Translation start site was not 
untranslated region and a Poly A tan. 

identified. 

Cloning of the missing 5' end of *» was performed by PCR 
Ration of DNA from placenta Marathon RACE cDNA compos.te 
using primers selected according to the EST clones sequence and the hnkers 
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of the composite. A 900 bp PGR fragment, partially evening with the 
identified 3. encoding EST Cones was obtained. The joined cDNA 
fragment (W, 172! bp iong (SEQ ID NO*,, contained an open readmg 
^e w hich encodes a peptide of 543 amino acids (SEQ ID W* W> 
, wi ,h a calculated molecular weight of 61,192 daltons. 

Cloning an extended 5' sequence was enab.ed from the human SK- 
h ep. ce.l line by PGR amplification using the Marathon RACE. The 5' 

sequence of the *a cDNA isolated from human placenta (SEQ ID NO*). 

and 15 which encodes, as shown in SEQ ID NOs:14 and 15, a polypeptide 
of 592 amin o acids with a calcu.ated molecular weight of 66,407 daltons. 

Xhe ability of the H P a gene product to cata,yze degradation of 
Heparan sulfate in an in ,Uro assay was ex^nined by expressing the entue 
, 5 open reading frame «*a in insect ce„s, using the Baculovims expression 
S ystem. Extracts and conditioned media of cells infected with v,rus 
containing the *. gene, demonstrated a high leve. of heparan sulfate 
.gradation activity both towards soluble ECM-derived HSPG and intac, 
ECM This degradation activity was inhibited by heparin, which is another 
20 substrate of heparanase. Cells infected with a similar construct containing 
no hpa gene had no such activity, nor did non-infected ce,„ The ability of 
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a *• tt, P pvtended 5' clone towards heparin was 
heparanase expressed from the extended d u« 

demonstrated in a mammalian expression system. 

The expression pattern of hpa RNA in various tissues and cell lines 
investigated using RT-PCR. It was found to be expressed only in 



was 



tissues 



and cells previously known to have heparanase activity. 
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A panel of monochromosomal human/CHO and human/mouse 
somatic cell hybrids was used to localize the human heparanase gene to 
human chromosome 4. The newly isolated heparanase sequence can be 
used to identify a chromosome region harboring a human heparanase gene 

in a chromosome spread. 

A human genomic library was screened and the human locus 
harboring the heparanase gene isolated, sequenced and characterized. 
Alternatively spliced heparanase mRNAs were identified and characterized. 
The human heparanase promoter has been isolated, identified and positively 
tested for activity. The mouse heparanase promoter has been isolated and 
identified as well. Antisense heparanase constructs were prepared and their 



influence on cells in vitro tested. A predicted heparanase active site was 
identified. And finally, the presence of sequences hybridizing with human 
heparanase sequences was demonstrated for a variety of mammalians and 

20 for an avian. 

According to one aspect of the present invention there is provided an 

isolated nucleic acid comprising a genomic, complementary or composite 
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polynucleotide sequence encoding a polypeptide having heparanase 
catalytic activity. 

According to further features in preferred embodiments of the 
invention described below, the polynucleotide or a portion thereof is 
5 hybridizable with SEQ ID NOs: 9, 13, 42, 43 or a portion thereof at 68 °C in 
6 x SSC, 1 % SDS, 5 x Denharts, 10 % dextran sulfate, 100 pg/ml salmon 
sperm DNA, and 32 p labeled probe and wash at 68 °C with 3 x SSC and 0.1 
% SDS. 

According to still further features in the described preferred 
10 embodiments the polynucleotide or a portion thereof is at least 60 % 
identical with SEQ ID NOs: 9, 13, 42, 43 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis software package 
developed by the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 12, gap extension penalty - 4). 
15 According to still further features in the described preferred 

embodiments the polypeptide is as set forth in SEQ ID NOs:10, 14, 44 or 
portions thereof. 

According to still further features in the described preferred 
embodiments the polypeptide is at least 60 % homologous to SEQ ID 
20 NOs:10, 14, 44 or portions thereof as determined with the Smith-Waterman 
algorithm, using the Bioaccelerator platform developed by Compugene 
(gapop: 10.0, gapext: 0.5, matrix: blosum62). 
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According to additional aspects of the present invention there are 
provided a nucleic acid construct (vector) comprising the iso.ated nuc.eic 
acid described herein and a host cell comprising the construct. 

According to a farther aspect of the present invention there is 
provided an antisense oligonucleotide comprising a polynucleotide or a 
polynucleotide analog of a. !eas. 10 bases being hybridizable in vivo, under 
physiological conditions, with a portion of a poiynucleotide strand encoding 
a polypeptide having heparanase catalytic activity. 

According to an additional aspect of the present invention there is 
„ provided a method of in vivo downregulating heparanase activity 
comprising the step of in vivo administering the antisense oligonucleotide 
herein described. 

According to ye. an additional aspect of the present invention there is 
provided a pharmaceutical composition comprising the antisense 
, 5 oligonucleotide herein described and a pharmaceutical,, acceptable carrier. 

According to still an additional aspect of the present invention there 

• • „ ^ti^nse oligonucleotide described 
is provided a ribozyme comprising the antisense ongon 

herein and a ribozyme sequence. 

According to a further aspect of the present invention there is 
20 provided an antisense nucleic acid construct comprising a promoter 
sequence and a polynucleotide sequence directing the synthesis of an 
antisense RNA sequence of at least 10 bases being hybridizable in vivo, 
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under physiological conditions, with a portion of a polynucleotide strand 
encoding a polypeptide having heparanase catalytic activity. 

According to further features in preferred embodiments of the 
invention described below, the polynucleotide strand encoding the 
polypeptide having heparanase cata.ytic activity is as set forth in SEQ ID 

NOs:9, 13, 42 or 43. 

According to still further features in the described preferred 
embodiments the po.ypep.ide having heparanase cata.ytic activity is as set 

forth in SEQ ID NOs: 10, 14 or 44. 

According to still a further aspect of the present invention there is 
provid ed a method of * W» deregulating heparanase activity 
comprising the step of in vivo administering the antisense nuc.eic actd 

construct herein described. 

According to yet a further aspect of the present invention there is 
5 provided a pharmaceutical composition comprising the antisense nuc.eic 
acid construct herein described and a pharmaceutical,, acceptable carrier. 

According to a further aspect of the present invention there ,s 
prOT ided a nucleic acid construct comprising a polynucleotide sequence 
Zoning as a promoter, the polynucleotide seuuence is derived from SEQ 

i +\a m 9^S-2635 thereof or from SEQ 
20 ID NO:42 and includes at least nucleotides 2535 263> tn 

ID NO:43 and includes at least nucleotides 320-420. 
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According to a 



further aspect of the present invention there is 
provided a method of expressing a poiynuc.eotide sequence comprising the 
step of Hgating the polynucleotide sequence into the nucleic acid construe, 
described above, downstream of the polynucleotide sequence derived from 

SEQID NOs:42or43. 

According to a further aspect of the present invention there is 
provided a recombinant protein comprising a polypeptide having heparanase 

catalytic activity. 

According to further features in preferred embodiments of the 
, invention described below, the polypeptide includes at least a portion of 

SEQIDNOs.10, 14 or 44. 

According to stil. further features in the described preferred 
embodiments the protein is encoded by a polynucleotide hybridise with 
SEQ ID NOs: 9, 13, 42, 43 or a portion thereof a, 68 °C in 6 x SSC, . % 
15 SDS, 5 x Denharts, 10 % dextran sulfate, 100 ttg/ml salmon sperm DNA, 
and 32p labe.ed probe and wash a, 68 °C with 3 x SSC and 0.1 % SDS. 

According to sti.l further features in the described preferred 
embodiments the protein is encoded by a polynucleotide at least 60 % 
identical with SEQ ID NOs: 9, 13, 42, 43 or portions thereof as determined 
20 using the Bestfit procedure of the DNA sequence analysis software package 
developed by the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 12, gap extension penalty - 4). 
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provided a pharmaceutical composition comprising, as an active ingredient, 
the recombinant protein herein described. 

According to a tether aspect of the present invention there ,s 
provided a method of identifying a chromosome region harboring a 
heparanase gene in a chromosome spread comprising the steps of (a, 
hybridizing the chromosome spread with a tagged polynucleotide probe 
encoding heparanase; (b) washing me chromosome spread, thereby 
removing excess of non-hybridized probe ; and (c) searching for signais 
associated with the hybridized tagged poivnucieotide probe, wheretn 

• ■ u •„<, indicative of a chromosome region harboring a 
detected signals being indicative 

heparanase gene. 

According to a further aspect of the present invention mere is 
provided a method of * v,Vo eiiciting anti-heparanase antibodies 
5 comprising the steps of administering a nucieic acid construe, inciudmg a 
poivnucieotide segment corresponding to at least a portion of SEQ ID 
N0 ,9 13 or 43 and a promoter for directing the expression of sa,d 
poly „uc,eo,ide segment * v,Vo. Accordingly, mere is provided also a DNA 
vaccine for * vivo eliciting anti-heparanase antibodies comprising a nuc,e,c 
20 acid construct including a polynucleotide segment corresponding to a, leas, 
a portion of SEQ ID NO.*, » « « and a promoter for directing the 
expression of said polynucleotide segment in vivo. 
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The present invention can be used to develop new drugs to inhibit 
tumor cell metastasis, inflammation and autoirnmunity. The identification 
of the hpa gene encoding for heparanase enzyme enables the production of 
a recombinant enzyme in heterologous expression systems. Additional 
, features, advantages, uses and applications of the present invention in 
biological science and in diagnostic and therapeutic medicine are described 
hereinafter. 



rjjjjjXPr rv nTK™ ™ nR AWINGS 

The invention herein described, by way of example only, with 
reference to the accompanying drawings, wherein: 

FIG. 1 plesents nucleotide sequence and deduced amino acid 
„ sequence of hpa La. A single nucleotide difference at position 799 (A 
t0 T) between thksT (Expressed Sequence Tag) and the PCR amplified 
cDNA (reverse Ascribed RNA) and the resulting amino acid substitution 
(Tyr to Phe) ari indicated above and below the substituted unit, 
respectively. Cyieine residues and the poly adenylate consensus 
2 „ sequence are underlled. The asterisk denotes the stop codon TGA. 

FIG. 2 demonstrates degradation of soluble sulfate labeled HSPG 
substrate by lysa.es of High Five cells infected with pF/^2 virus. Lysates 
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of High Five ceils .hat we infected with virus (.) or conttol P F2 

viro s (□) were incubated (1. h, 37 -Q with su.fate iabeied ECM-derived 
soluble HSPG (peak I). The incubation medium was then subjected to ge, 
filtration on Sepharose 6B. Low moiecuiar weight HS degradation 
fragments (peak 1!) were produced oniy during incubation with me pF*-2 
infected celis, but there was no degradation of the HSPG substrate (♦) by 

lysates of pF2 infected cells. 

FIGs. 3a-b demonstrate delation of soluble sulfate labeled HSPG 

Cuhure media of High Five cells infected with pF^2 (3a) or P F^4 (3b) 
viruses (.), or with control viruses (.) were incubated (18 h, 37 -C) with 
sulfate labeled ECM-derived soiuble HSPG (peak I, »). The incubation 
media were men subjected to ge. filtration on Sepharose 6B. Low 
m olecu.ar weigh, HS degradation fragments (peak I.) were produced only 
5 during incubation with the H P a gene containing viruses. There was no 
degradation of the HSPG substrate by the culture medium of cells infected 

with control viruses. 

FIG 4 presents size ftac.iona.ion of heparanase activity expressed by 

pFfcp.2 infected cells. Culture medium of P F^2 infected High Five cells 
20 was applied onto a 50 kDa cut-off membrane. Heparans activity 
aversion of .he peak . substrate, (♦, into peak I. HS degradation 
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fragments) was 



found in the high (> 50 kDa) (•), but not low (< 50 kDa) (o) 

molecular weight compartment. 

FIGs. 5a-b demonstrate the effect of heparin on heparanase activity 
expressed by pF**2 and P F^4 infected High Five cells. Culture media 
, of P F/. pa 2 (5a) and pF/prf (5b) infected High Five cells were incubated 
(18 h, 37 oQ with sulfate labeled ECM-derived soluble HSPG (peak I, ♦) in 
the absence (.) or presence W of .0 ug/ml heparin. Production of low 
molecular weight HS degradation fragments was completely abolished in 
the presence of heparin, a potent inhibitor of heparanase activity (6, 7). 

FIGs. 6a-b demonstrate degradation of sulfate labeled intact ECM by 
virus infected High Five and Sf21 cells. High Five (6a) and SQ1 (6b) cells 
were plated on sulfate labeled ECM and infected (48 h, 28 «C) with pF*«rf 
(.) or control pFl (□) viruses. Control non-infected Sf21 cells « were 
plated on the labeled ECM as well. The pH of the cultured medium was 
„ adjusted to 6.0 - 6.2 Mowed by 24 h incubation at 37 -C. Sulfate labe.ed 
material released into the incubation medium was analyzed by gel filtration 
on Sepharose 6B. HS degradation fragments were produced only by cells 

infected with the hpa containing virus. 

FIG. 7a-b demonstrate degradation of sulfate labeled intact ECM by 
» virus infected cells. High Five (7a) and SCI (7b) cells were plated on 
sulfate labeled ECM and infected (48 h, 28 «C) with pVhpaA (.) or control 
pFl (o) viruses. Control non-infected StZl cells (r) were plate on labeled 
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ECM as well. The pH of the cultured medium was adjusted to 6.0 - 6.2, 
followed by 48 h incubation at 28 °C. Sulfate .abe.ed degradation 
fragments released into the incubation medium was analyzed by ge. 
filtration on Sepharose 6B. HS degradation fragments were produced only 
by cells infected with the hpa containing virus. 

FIGs. 8a-b demonstrate degradation of sulfate labeled intact ECM by 
the culture medium ofpF*^ infected cells. Culture media of High Five 
(8a) and SOI (8b) cells that were infected with pFfcxrt (•) or control P F1 ( 
□) viruses were incubated (48 h, 37 »C, pH 6.0) with intact sulfate labeled 
ECM. The ECM was also incubated with the culture medium of control 
non-infected SB1 cells W . Sulfate labeled material released into the 
action mixture was subjected to gel filtration analysis. Heparanase 
activity was detected only in the culture medium of P Vh P aA infected cells. 
FIGs. 9a-b demonstrate the effect of heparin on heparanase activity 
5 i„ the culture medium of pF*** infected cells. Sulfate labeled ECM was 
incubated (24 h, 37 -C, pH 6.0) with culture medium of pF**4 infected 
High Five (9a) and SOI (9b) cells in the absence (.) or presence (V) of 10 
ug ,m, heparin. Sulfate labeled materia, released into the incubation 
m edium was subjected to gel filtration on Sepharose 6B. Heparanase 
* activity (production of peak II HS degradation fragments) was completely 
inhibited in the presence of heparin. 
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♦ ^ purification of recombinant heparanase on 
FIGs lOa-b demonstrate purification ui 

vir us was subjected to heparin : Sepharose chromatography. Elution of 
fractions was performed with 0,5 - 2 M NaC, gradient (♦>. Heparanase 
5 ac „vity in the eluted fractions is demonstrated in Figure ,0. «• Fractions 
15 .28 were subjected ,0 15 % SDS-polyacrylamide ge. electrophoreses 
foiiowed by siiver nitrate staining. A correction is demonstrated between a 
maj „r protein band (MW - 63,000) in fractions 19 - 24 and heparanase 

activity. 

FJGs 1 la-b demonstrate puriBcation of recombinant heparanase on 

Sepharose ( Figure 10a) were pooled, concentrated and applied onto 
Superdex 75 FPLC column. Fractions were collected and aliauots of each 

cn F^anre lla^ and analyzed by 
fraction were tested for heparanase activ.ty (c, F.gure a, 

15 sDS-polyacrylamide ge, electrophoresis followed by silver nitrate staining 
(Figure lib). A correlation is seen between the appearance of a major 
protein band (MW - 63,000) in fractions 4 - 7 and heparanase activity. 

FIGs. 12a-e demonstrate expression of the hpa gene by RT-PCR wth 
tot a, RNA from human embryonal tissues (.2a), human extra-embryonal 
20 tissueS (12b) and ce„ lines from different origins (1 2c-e). RT-PCR products 
U si„g kpa specific primers (1), primers for GAPDH housekeeping gene (II), 
and control reactions without reverse transcriptase demonstrating absence of 
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• ,„„ in RNA samples (III). M- DNA 
genomic DNA or other contamination m RNA 

• ht marker VI (Boehringer Mannheim). For 12a: lane 1 - 
mol ccu>ar ^t marker 

„eutrophil ce«s (a«, to* 2 - muscle, lane 

cytot ro P ho b ,ast cellsC* — lane 9 - Co—ast cells 0 h 

. vUm) lane 10 - cytotropic ce.,s (6 h in *» 

„ • ,ane 12 - cytotrophob.ast cells (48 h * 

cytbtrophoblast cells (18 h in vitro), lane * 

, (ro , - ,2c: lane 1 - «a — - ^ 2 " NCI ; HTR 
« lta e lane 3 - SW-480 human hepatoma cel. line, lane 4 - HTR 

_ hepatoc6lMar 

(cyto «opho b >asts transformed hy SV40), .ane 

carcin oma ce„ Hne, lane 6 - E,28 oladoer carcinoma cel. hne. 
carcmuii DAMI human 

, , SK-hep-l human hepatoma cell line, lane 2 - DAM 
lane 1 - ^ nc ^ . r uTjc 

, DAMI cell line + PMA ? lane 4 ' CHK " 

megakaryocyte cell line, lane 3 - DAMi 

megaKdry y ^ ABAE bovine 

« rwRF cell line. For 12e. lane i ^ 
15 cell line + PMA> lane ^ " ^ 

„ i ,2 10 63 human ovarian cell line, lane 3 

aortic endothelial cells, lane 2 - 1063 

• ™ MDA435 cell line, lane 4 - human breast 
human breast carcinoma MDA435 

carcinoma MDA231 cell line. 

• „ between nucleotide sequences of the 
FIG. 13 presents a comparison between 

F ST cDNA fragment (SEQ ID NO:12) which is 80 
20 hum an^andamou S eESTc D K 

* homologous to the 3' end (starting at nucleotid 

h n The aligned termination codons are underlined, 
of the human hpa . 1 ne angnc 
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FIG 14 demonstrates the ******* "ion of the H P a gene. 

A human of were separated on 0.7 % agarose gel 
DMA of hamster, mouse and human 

„ t 9n p 1 - Lambda DNA 
following amplification with *. specific primers. Lane 

* k Lanes 3-5 - human, mouse and hamster genomic 
amplification products. Lanes J = 

.• tv Lanes 6-29, human monochromosomal somaUc cell 
DNA, respectively. Lanes o i?, 

i t? and X and Y, respectively. Lane 
hybrids representing chromosomes 1-22 and X a 

3„ . Lambda DNA digested with «BO. - —ion *~ - 
. appro— 2.8 Kb is observed on,y in lanes 5 and , representing human 
g lomic DNA and DNA derived from cel, hybrid carrying human 
„me 4, respectively, T,ese results demonstrate tha, the „, gene ,s 

localized in human chromosome 4. 

no I5 demonstrates the genomic exon-intron structure of the 
1; human ^ locus (top) and the re.ative positions of the lambda clones used 

„, (& and the horizontal lines therebetween 
rectangles represent exons (E) and 

Continuous lines represent DNA fragments, which were used for science 

,• • umhrla 6 represent a region, wmcn 
2Q analysis. The discontinuous line m lambda 6 r p 

„.„„alvzed The plasmid contains 
overlaps with lambda 8 and hence was not analyzed. 

a PCR product, which bridges the gap between L3 and L6. 
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Of 



HO 16 presents.the nucleotide sequence of the genomic region 

lower case. The deduce|l amino acid sequence of the exons is printed below 

Two predicted transcription start sites are shown 



the nucleotide sequence 



in bold. 



HQ 17 presentsl alignment of the amino acid sequences of human 
heparans mouse and partial sequences of tat homo,o g ues. The human 
and the mouse sequenL were determined by sequence anaiysis of the 

Cones, which represen two different regions (5- and 30 of the ra, *» 
cDNA The human sequence and the amino acids in the mouse and rat 



homologues, which ar^ 



identical to the human sequence, appear in bold. 



7™ Ten micrograms of genomic 
FIG. 1 8 presents a heparanase Zoo blot. 1 en micros 

DNA from various sources were digested with EcoXl and separated on 0.7 
, 5 * agarose - TBE gel. FoUowing electrophoresis, the was gel treated with 
HC1 and than with NaOH and me DNA fragments were downward 
tra „sferred to a nyion membrane (Hybond N + , Amersham, with 0.4 N 
Na OH The membrane was hybridized with a 1.6 Kb DNA probe tha, 

„ Rat; P - P iK Cw - Cow; Ht - Horse; S - Sheep; Rb - Rabbit; D - Dog; Ch 
- Chicken; F - Fish. Size markers (Lambda firfdl) are shown on the left 
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FIG. 19 demonstrates the secondary structure prediction for 
heparanase performed/using the PHD server - Profile network Prediction 
Heidelberg. H - hdix, E - extended (beta strand), The glutamic acid 
predicted as the prbton donor is marked by asterisk and the possible 
nucleophiles are underlined. 



DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention is of a polynucleotide or nucleic acid, referred 
to hereinbelow interchangeably as hpa, hpa cDNA or hpa gene or identified 
by its SEQ ID NOs, encoding a polypeptide having heparanase activity, 
vectors or nucleic acid constructs including same and which are used for 
over-expression or antisense inhibition of heparanase, genetically modified 
cells expressing same, recombinant protein having heparanase activity, 
antisense oligonucleotides and ribozymes for heparanase modulation, and 
heparanase promoter sequences which can be used to direct the expression 
of desired genes. 

Before explaining at least one embodiment of the invention in detail, 
it is to be understood that the invention is not limited in its application to the 
details of construction and the arrangement of the components set forth in 
the following description or illustrated in the drawings. The invention is 
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10 



15 



20 



capable of other embodiments or of being practiced or carried out in various 
ways. Also, it is to be understood that the phraseology and terminology 
employed herein is for the purpose of description and should not be 

regarded as limiting. 

Cloning of the human and mouse hpa genes, cDNAs and genomic 
sequence (for human), encoding heparanase and expressing recombinant 
heparanase by transfected cells is reported herein. These are the first 
mammalian heparanase genes to be cloned. 

A purified preparation of heparanase isolated from human hepatoma 
cells was subjected to tryptic digestion and microsequencing. 

The YGPDVGQPR (SEQ ID NO:8) sequence revealed was used to 
screen EST databases for homology to the corresponding back translated 
DNA sequences. Two closely related EST sequences were identified and 

were thereafter found to be identical. 

Both clones contained an insert of 1020 bp which includes an open 
reading frame of 973 bp followed by a 3' untranslated region of 27 bp and a 
Poly A tail, whereas a translation start site was not identified. 

Cloning of the missing 5' end was performed by PCR amplification 
of DNA from placenta Marathon RACE cDNA composite using primers 
selected according to the EST clones sequence and the linkers of the 
composite. 
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A 900 bp PCR fragment, partially overlapping with the identified 3' 
encoding EST clones was obtained. The joined cDNA fragment {hpa), 
1721 bp long (SEQ ID NO:9), contained an open reading frame which 
encodes, as shown in Figure 1 and SEQ ID NO: 11, a polypeptide of 543 
amino acids (SEQ ID NO: 10) with a calculated molecular weight of 61,192 
daltons. 

A single nucleotide difference at position 799 (A to T) between the 
EST clones and the PCR amplified cDNA was observed. This difference 
results in a single amino acid substitution (Tyr to Phe) (Figure 1). 
Furthermore, the published EST sequences contained an unidentified 
nucleotide, which following DNA sequencing of both the EST clones was 
resolved into two nucleotides (G and C at positions 1630 and 1631 in SEQ 
ID NO: 9, respectively). 

The ability of the hpa gene product to catalyze degradation of 
heparan sulfate in an in vitro assay was examined by expressing the entire 
open reading frame in insect cells, using the Baculovirus expression system. 

Extracts and conditioned media of cells infected with virus 
containing the hpa gene, demonstrated a high level of heparan sulfate 
degradation activity both towards soluble ECM-derived HSPG and intact 
ECM, which was inhibited by heparin, while cells infected with a similar 
construct containing no hpa gene had no such activity, nor did non-infected 
cells. 
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The expression pattern of hpa RNA in various tissues and cell lines 
was investigated using RT-PCR. It was found to be expressed only in 
tissues and cells previously known to have heparanase activity. 

Cloning an extended 5' sequence was enabled from the human SK- 
5 hepl cell line by PCR amplification using the Marathon RACE. The 5' 
extended sequence of the SK-hepl hpa cDNA was assembled with the 
sequence of the hpa cDNA isolated from human placenta (SEQ ID NO:9). 
The assembled sequence contained an open reading frame, SEQ ID NOs: 13 
and 15, which encodes, as shown in SEQ ID NOs: 14 and 15, a polypeptide 
10 of 592 amino acids, with a calculated molecular weight of 66,407 daltons. 
This open reading frame was shown to direct the expression of catalytically 
active heparanase in a mammalian cell expression system. The expressed 
heparanase was detectable by anti heparanase antibodies in Western blot 
analysis. 

15 A panel of monochromosomal human/CHO and human/mouse 

somatic cell hybrids was used to localize the human heparanase gene to 
human chromosome 4. The newly isolated heparanase sequence can 
therefore be used to identify a chromosome region harboring a human 
heparanase gene in a chromosome spread. 

20 The hpa cDNA was then used as a probe to screen a a human 

genomic library. Several phages were positive. These phages were 
analyzed and were found to cover most of the hpa locus, except for a small 
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portion which was recovered by bridging PCR. The hpa locus covers about 
50,000 bp. The hpa gene includes 12 exons separated by 1 1 introns. 

RT-PCR performed on a variety of cells revealed alternatively 

spliced hpa transcripts. 

The amino acid sequence of human heparanase was used to search 
for homologous sequences in the DNA and protein databases. Several 
human EST's were identified, as well as mouse sequences highly 
homologous to human heparanase. The following mouse EST's were 
identified AA177901, AA674378, AA67997, AA047943, AA690179, 
All 22034, all sharing an identical sequence and correspond to amino acids 
336-543 of the human heparanase sequence. The entire mouse heparanase 
cDNA was cloned, based on the nucleotide sequence of the mouse EST's 
using Marathon cDNA libraries. The mouse and the human hpa genes share 
an average homology of 78 % between the nucleotide sequences and 81 % 
similarity between the deduced amino acid sequences, hpa homologous 
sequences from rat were also uncovered (EST's AI060284 and AI237828). 

Homology search of heparanase amino acid sequence against the 
DNA and the protein databases and prediction of its protein secondary 
structure enabled to identify candidate amino acids that participate in the 
heparanase active site. 



Expression of hpa antisense in mammalian cell lines resulted in 
about five fold decrease in the number of recoverable cells as compared to 
controls. 

Human Hpa cDNA was shown to hybridize with genomic DNAs of a 
5 variety of mammalian species and with an avian. 

The human and mouse hpa promoters were identified and the human 
promoter was tested positive in directing the expression of a reporter gene. 

Thus, according to the present invention there is provided an isolated 
nucleic acid comprising a genomic, complementary or composite 
10 polynucleotide sequence encoding a polypeptide having heparanase 
catalytic activity. 

The phrase "composite polynucleotide sequence" refers to a sequence 
which includes exonal sequences required to encode the polypeptide having 
heparanase activity, as well as any number of intronal sequences. The 

15 intronal sequences can be of any source and typically will include conserved 
splicing signal sequences. Such intronal sequences may further include cis 
acting expression regulatory elements. 

The term "heparanase catalytic activity" or its equivalent term 
"heparanase activity" both refer to a mammalian endoglycosidase 

20 hydrolyzing activity which is specific for heparan or heparan sulfate 
proteoglycan substrates, as opposed to the activity of bacterial enzymes 
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(heparinase I, II and III) which degrade heparin or heparan sulfate by means 
of (3-elimination (37). 

According to a preferred embodiment of the present invention the 
polynucleotide or a portion thereof is hybridizable with SEQ ID NOs: 9, 13, 
42, 43 or a portion thereof at 68 °C in 6 x SSC, 1 % SDS, 5 x Denharts, 10 
% dextran sulfate, 100 (^g/ml salmon sperm DNA, and 32p labeled probe 
and wash at 68 °C with 3, 2, 1, 0.5 or 0.1 x SSC and 0.1 % SDS. ; 

According to another preferred embodiment of the present invention 
the polynucleotide or a portion thereof is at least 60 %, preferably at least 65 
%, more preferably at least 70 %, more preferably at least 75 %, more 
preferably at least 80 %, more preferably at least 85 %, more preferably at 
least 90 %, most preferably, 95-100 % identical with SEQ ID NOs: 9, 13, 
42, 43 or portions thereof as determined using the Bestfit procedure of the 
DNA sequence analysis software package developed by the Genetic 
Computer Group (GCG) at the university of Wisconsin (gap creation 
penalty - 12, gap extension penalty - 4 - which are the default parameters). 

According to another preferred embodiment of the present invention 
the polypeptide encoded by the polynucleotide sequence is as set forth in 
SEQ ID NOs: 10, 14, 44 or portions thereof having heparanase catalytic 
activity. Such portions are expected to include amino acids Asp-Glu 224- 
225 (SEQ ID NO: 10), which can serve as proton donors and glutamic acid 
343 or 396 which can serve as a nucleophile. 



59 



According to another preferred embodiment of the present invention 
the polypeptide encoded by the polynucleotide sequence is at least 60 %, 
preferably at least 65 %, more preferably at least 70 %, more preferably at 
least 75 %, more preferably at least 80 %, more preferably at least 85 %, 
5 more preferably at least 90 %, most preferably, 95-100 % homologous (both 
similar and identical acids) to SEQ ID NOs:10, 14, 44 or portions thereof as 
determined with the Smith-Waterman algorithm, using the Bioaccelerator 
platform developed by Compugene (gapop: 10.0, gapext: 0.5, matrix: 
blosum62, see also the description to Figure 17). 
10 Further according to the present invention there is provided a nucleic 

acid construct comprising the isolated nucleic acid described herein. The 
construct may and preferably further include an origin of replication and 
trans regulatory elements, such as promoter and enhancer sequences. 

The construct or vector can be of any type. It may be a phage which 
,5 infects bacteria or a virus which infects eukaryotic cells. It may also be a 
plasmid, phagemid, cosmid, bacmid or an artificial chromosome. 

Further according to the present invention there is provided a host 
cell comprising the nucleic acid construct described herein. The host cell 
can be of any type. It may be a prokaryotic cell, an eukaryotic cell, a cell 
20 line, or a cell as a portion of an organism. The polynucleotide encoding 
heparanase can be permanently or transiently present in the cell. In other 
words genetically modified cells obtained following stable or transient 
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transfection, transformation or transduction are all within the scope of the 
present invention. The polynucleotide can be present in the cell in low copy 
(say 1-5 copies) or high copy number (say 5-50 copies or more). It may be 
integrated in one or more chromosomes at any location or be present as an 

; extrachromosomal material. 

The present invention is further directed at providing a heparanase 
over-expression system which includes a cell overexpressing heparanase 
catalytic activity. The cell may be a genetically modified host cell 
transiently or stably transfected or transformed with any suitable vector 
o which includes a polynucleotide sequence encoding a polypeptide having 
heparanase activity and a suitable promoter and enhancer sequences to 
direct over-expression of heparanase. However, the overexpressing cell 
may also be a product of an insertion (e.g., via homologous recombination) 
of a promoter and/or enhancer sequence downstream to the endogenous 
15 heparanase gene of the expressing cell, which will direct over-expression 



from the endogenous gene. 



The term "over-expression" as 



used herein in the specification and 



claims 



below refers to a level of expression which is higher than a basal 
level of expression typically characterizing a given cell under otherwise 

20 identical conditions. 

According to another aspect the present invention provides an 
antisense oligonucleotide comprising a polynucleotide or a polynucleotide 
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analog of at least 10, preferably 11-15, more preferably 16-17, more 
preferably 18, more preferably 19-25, more preferably 26-35, most 
preferably 35-100 bases being hybridizable in vivo, under physiological 
conditions, with a portion of a polynucleotide strand encoding a polypeptide 
having heparanase catalytic activity. The antisense oligonucleotide can be 
used for downregulating heparanase activity by in vivo administration 
thereof to a patient. As such, the antisense oligonucleotide according to the 
present invention can be used to treat types of cancers which are 
characterized by impaired (over) expression of heparanase, and are 
dependent on the expression of heparanase for proliferating or forming 
metastases. 

The antisense oligonucleotide can be DNA or RNA or even include 
nucleotide analogs, examples of which are provided in the Background 
section hereinabove. The antisense oligonucleotide according to the present 
invention can be synthetic and is preferably prepared by solid phase 
synthesis. In addition, it can be of any desired length which still provides 
specific base pairing (e.g., 8 or 10, preferably more, nucleotides long) and it 
can include mismatches that do not hamper base pairing under physiological 
conditions. 

Further according to the present invention there is provided a 
pharmaceutical composition comprising the antisense oligonucleotide 
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herein described and a pharmaceutically acceptable carrier. The carrier can 
be, for example, a liposome loadable with the antisense oligonucleotide. 

According to a preferred embodiment of the present invention the 
antisense oligonucleotide further includes a ribozyme sequence. The 
ribozyme sequence serves to cleave a heparanase RNA molecule to which 
the antisense oligonucleotide binds, to thereby downregulate heparanase 
expression. 

Further according to the present invention there is provided an 
antisense nucleic acid construct comprising a promoter sequence and a 
polynucleotide sequence directing the synthesis of an antisense RNA 
sequence of at least 10 bases being hybridizable in vivo, under physiological 
conditions, with a portion of a polynucleotide strand encoding a polypeptide 
having heparanase catalytic activity. Like the antisense oligonucleotide, the 
antisense construct can be used for downregulating heparanase activity by in 
vivo administration thereof to a patient. As such, the antisense construct, 
like the antisense oligonucleotide, according to the present invention can be 
used to treat types of cancers which are characterized by impaired (over) 
expression of heparanase, and are dependent on the expression of 
heparanase for proliferating or forming metastases. 

Thus, further according to the present invention there is provided a 
pharmaceutical composition comprising the antisense construct herein 



63 



described and a pharmaceutical* acceptable carrier. The carrier can be, for 
example, a liposome loadable with the antisense construct. 

Formulations for topical administration may include, but are not 
Hmited to, lotions, ointments, gels, creams, suppositories, drops, liquids, 
sprays and powders. Conventional pharmaceutical carriers, aqueous, 
powder or oily bases, thickeners and the like may be necessary or desirable. 
Coated condoms, stents, active pads, and Cher medical devices may also be 
useful. Compositions for oral administration include powders or granules, 
suspensions or solutions in water or non-aqueous media, sachets, capsules 
, or tablets. Thickeners, diluents, flavorings, dispersing aids, emulsifiers or 
binders may be desirable. Formulations for parenteral administration may 
indude, but are not limited to, sterile aqueous solutions which may also 
contain buffers, diluents and other suitable additives. 

Dosing is dependent on severity and responsiveness of the condition 
15 ,„ be treated, but will normally be one or more doses per day, week or 
month with course of treatment lasting from several days to several months 
or until a cure is effected or a diminution of disease state is achieved. 
Persons ordinarily skilled in the art can easily determine optimum dosages, 
dosing methodologies and repetition rates. 
20 Further according to the present invention there is provided a nucleic 

acid construct comprising a polynucleotide sequence functioning as a 
promoter, the polynucleotide sequence is derived from SEQ ID NO:42 and 
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includes at least nucleotides 2135-2635, preferably 2235-2635, more 
preferably 2335-2635, more preferably 2435-2635, most preferably 2535- 
2635 thereof, or SEQ ID NO:43 and includes at least nucleotides 1-420, 
preferably 120-420, more preferably 220-420, most preferably 320-420, 
thereof. These nucleotides are shown in the example section that follows to 
direct the synthesis of a reporter gene in transformed cells. Thus, further 
according to the present invention there is provided a method of expressing 
a polynucleotide sequence comprising the step of ligating the 
polynucleotide sequence downstream to either of the promoter sequences 
described herein. Heparanase promoters can be isolated from a variety of 
mammalian an other species by cloning genomic regions present 5' to the 
coding sequence thereof. This can be readily achievable by one ordinarily 
skilled in the art using the heparanase polynucleotides described herein, 
which are shown in the Examples section that follows to participate in 
efficient cross species hybridization. 

Further according to the present invention there is provided a 
recombinant protein comprising a polypeptide having heparanase catalytic 
activity. The protein according to the present invention include 
modifications known as post translational modifications, including, but not 
limited to, proteolysis (e.g., removal of a signal peptide and of a pro- or 
preprotein sequence), methionine modification, glycosylation, alkylation 
(e.g., methylation), acetylation, etc. According to preferred embodiments 
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the polypeptide includes at least a portion of SEQ ID NOs:10, 14 or 44, the 
portion has heparanase catalytic activity. According to preferred 
embodiments of the present invention the protein is encoded by any of the 
above described isolated nucleic acids. Further according to the present 
invention there is provided a pharmaceutical composition comprising, as an 
active ingredient, the recombinant protein described herein. 

The recombinant protein may be purified by any conventional protein 
purification procedure close to homogeneity and/or be mixed with additives. 
The recombinant protein may be manufactured using any of the genetically 
modified cells described above, which include any of the expression nucleic 
acid constructs described herein. The recombinant protein may be in any 
form. It may be in a crystallized form, a dehydrated powder form or in 
solution. The recombinant protein may be useful in obtaining pure 
heparanase, which in turn may be useful in eliciting anti-heparanase 
antibodies, either poly or monoclonal antibodies, and as a screening active 
ingredient in an anti-heparanase inhibitors or drugs screening assay or 
system. 

Further according to the present invention there is provided a method 
of identifying a chromosome region harboring a human heparanase gene in 
a chromosome spread, the method is executed implementing the following 
method steps, in which in a first step the chromosome spread (either 
interphase or metaphase spread) is hybridized with a tagged polynucleotide 
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probe encoding heparanase. The tag is preferably a fluorescent tag. In a 
second step according to the method the chromosome spread is washed, 
thereby excess of non-hybridized probe is removed. Finally, signals 
associated with the hybridized tagged polynucleotide probe are searched for, 
wherein detected signals being indicative of a chromosome region harboring 
the human heparanase gene. One ordinarily skilled in the art would know 
how to use the sequences disclosed herein in suitable labeling reactions and 
how to use the tagged probes to detect, using in situ hybridization, a 
chromosome region harboring a human heparanase gene. 

Further according to the present invention there is provided a method 
of in vivo eliciting anti-heparanase antibodies comprising the steps of 
administering a nucleic acid construct including a polynucleotide segment 
corresponding to at least a portion of SEQ ID NOs:9, 13 or 43 and a 
promoter for directing the expression of said polynucleotide segment in 
vivo. Accordingly, there is provided also a DNA vaccine for in vivo 
eliciting anti-heparanase antibodies comprising a nucleic acid construct 
including a polynucleotide segment corresponding to at least a portion of 
SEQ ID NOs:9, 13 or 43 and a promoter for directing the expression of said 
polynucleotide segment in vivo. The vaccine optionally further includes a 
pharmaceutically acceptable carrier, such as a virus, liposome or an antigen 
presenting cell. Alternatively, the vaccine is employed as a naked DNA 



vaccine 



67 



The present invention can be used to develop treatments for various 
diseases, to develop diagnostic assays for these diseases and to provide new 
tools for basic research especially, in the fields of medicine and biology. 

Specifically, the present invention can be used to develop new drugs 
to inhibit tumor cell metastasis, inflammation and autoimmunity. The 
identification of the hpa gene encoding for the heparanase enzyme enables 
the production of a recombinant enzyme in heterologous expression 
systems. 

Furthermore, the present invention can be used to modulate 
bioavailability of heparin-binding growth factors, cellular responses to 
heparin-binding growth factors (e.g., bFGF, VEGF) and cytokines (e.g., IL- 
8), cell interaction with plasma lipoproteins, cellular susceptibility to viral, 
protozoa and some bacterial infections, and disintegration of 
neurodegenerative plaques. Recombinant heparanase offers a potential 
treatment for wound healing, angiogenesis, restenosis, atherosclerosis, 
inflammation, neurodegenerative diseases (such as, for example, 
Genstmann-Straussler Syndrome, Creutzfeldt-Jakob disease, Scrape and 
Alzheimer's disease) and certain viral and some bacterial and protozoa 
infections. Recombinant heparanase can be used to neutralize plasma 
heparin, as a potential replacement of protamine. 

As used herein, the term "modulate" includes substantially inhibiting, 
slowing or reversing the progression of a disease, substantially ameliorating 
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clinical symptoms of a disease or condition, or substantially preventing the 
appearance of clinical symptoms of a disease or condition. A "modulator" 
therefore includes an agent which may modulate a disease or condition. 
Modulation of viral, protozoa and bacterial infections includes any effect 
which substantially interrupts, prevents or reduces any viral, bacterial or 
protozoa activity and/or stage of the virus, bacterium or protozoon life 
cycle, or which reduces or prevents infection by the virus, bacterium or 
protozoon in a subject, such as a human or lower animal. 

As used herein, the term "wound" includes any injury to any portion 
of the body of a subject including, but not limited to, acute conditions such 
as thermal burns, chemical burns, radiation burns, burns caused by excess 
exposure to ultraviolet radiation such as sunburn, damage to bodily tissues 
such as the perineum as a result of labor and childbirth, including injuries 
sustained during medical procedures such as episiotomies, trauma-induced 
injuries including cuts, those injuries sustained in automobile and other 
mechanical accidents, and those caused by bullets, knives and other 
weapons, and post-surgical injuries, as well as chronic conditions such as 
pressure sores, bedsores, conditions related to diabetes and poor circulation, 

and all types of acne, etc. 

Anti-heparanase antibodies, raised against the recombinant enzyme, 
would be useful for immunodetection and diagnosis of micrometastases, 
autoimmune lesions and renal failure in biopsy specimens, plasma samples, 
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and body fluids. Such antibodies may also serve as neutralizing agents for 

heparanase activity. 

The genomic heparanase sequences described herein can be used to 
construct knock-in and knock-out constructs. Such constructs include a 
fragment of 10-20 Kb of a heparanase locus and a negative and a positive 
selection markers and can be used to provide heparanase knock-in and 
knock-out animal models by methods known to the skilled artisan. Such 
animal models can be used for studying the function of heparanase in 
developmental processes, and in normal as well as pathological processes. 
They can also serve as an experimental model for testing drugs and gene 
therapy protocols. The complementary heparanase sequence (cDNA) can 
be used to derive transgenic animals, overexpressing heparanase for same. 
Alternatively , if cloned in the antisense orientation, the complementary 
heparanase sequence (cDNA) can be used to derive transgenic animals 
under-expressing heparanase for same. 

The heparanase promoter sequences described herein and other cis 
regulatory elements linked to the heparanase locus can be used to regulated 
the expression of genes. For example, these promoters can be used to 
direct the expression of a cytotoxic protein, such as TNF, in tumor cells. It 
will be appreciated that heparanase itself is abnormally expressed under the 
control of its own promoter and other cis acting elements in a variety of 
tumors, and its expression is correlated with metastasis. It is also 
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abnormally highly expressed in inflammatory cells. The introns of the 
heparanase gene can be used for the same purpose, as it is known that 
introns, especially upstream introns include cis acting element which affect 
expression. A heparanase promoter fused to a reporter protein can be used 
to study/monitor its activity. 

The polynucleotide sequences described herein can also be used to 
provide DNA vaccines which will elicit in vivo anti heparanase antibodies. 
Such vaccines can therefore be used to combat inflammatory and cancer. 

Antisense oligonucleotides derived according to the heparanase 
sequences described herein, especially such oligonucleotides supplemented 
with ribozyme activity, can be used to modulate heparanase expression. 
Such oligonucleotides can be from the coding region, from the introns or 
promoter specific. Antisense heparanase nucleic acid constructs can 
similarly function, as well known in the art. 

The heparanase sequences described herein can be used to study the 
catalytic mechanism of heparanase. Carefully selected site directed 
mutagenesis can be employed to provide modified heparanase proteins 
having modified characteristics in terms of, for example, substrate 
specificity, sensitivity to inhibitors, etc. 

While studying heparanase expression in a variety of cell types 
alternatively spliced transcripts were identified. Such transcripts if found 
characteristic of certain pathological conditions can be used as markers for 
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such conditions. Such transcripts are expected to direct the synthesis of 
heparanases with altered functions. 

Additional objects, advantages, and novel features of the present 
5 invention will become apparent to one ordinarily skilled in the art upon 
examination of the following examples, which are not intended to be 
limiting. Additionally, each of the various embodiments and aspects of the 
present invention as delineated hereinabove and as claimed in the claims 
section below finds experimental support in the following examples. 



10 



EXAMPLES 



Generally, the nomenclature used herein and the laboratory 
procedures in recombinant DNA technology described below are those well 
known and commonly employed in the art. Standard techniques are used for 
is cloning, DNA and RNA isolation, amplification and purification. Generally 
enzymatic reactions involving DNA ligase, DNA polymerase, restriction 
endonucleases and the like are performed according to the manufacturers- 
specifications. These techniques and various other techniques are generally 
performed according to Sambrook et al. Molecular Cloning-A Laboratory 
2 o Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (1989), 
w hich is incorporated herein by reference. Other general references are 
provided throughout this document. The procedures therein are believed to 
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be well known in the art and are provided for the convenience of the reader. 
All the information contained therein is incorporated herein by reference. 

The following protocols and experimental details are referenced in 
5 the Examples that follow: 



Purification and characterization of heparanase from a human 
hepatoma cell line and human placenta: A human hepatoma cell line (Sk- 
hep-1) was chosen as a source for purification of a human tumor-derived 

10 heparanase. Purification was essentially as described in U.S. Pat. No. 
5,362,641 to Fuks, which is incorporated by reference as if fully set forth 
herein. Briefly, 500 liter, SxlO 11 cells were grown in suspension and the 
heparanase enzyme was purified about 240,000 fold by applying the 
following steps: (i) cation exchange (CM-Sephadex) chromatography 

15 performed at pH 6.0, 0.3-1.4 M NaCl gradient; (ii) cation exchange (CM- 
Sephadex) chromatography performed at pH 7.4 in the presence of 0.1% 
CHAPS, 0.3-1.1 M NaCl gradient; (iii) heparin-Sepharose chromatography 
performed at pH 7.4 in the presence of 0.1% CHAPS, 0.35-1.1 M NaCl 
gradient; (iv) ConA-Sepharose chromatography performed at pH 6.0 in 

20 buffer containing 0.1 % CHAPS and 1 M NaCl, elution with 0.25 M a- 
methyl mannoside; and (v) HPLC cation exchange (Mono-S) 



chromatography performed at pH 7.4 in the presence of 0.1 % CHAPS, 
0.25-1 MNaCl gradient. 

Active fractions were pooled, precipitated with TCA and the 
precipitate subjected to SDS polyacrylamide gel electrophoresis and/or 
5 tryptic digestion and reverse phase HPLC. Tryptic peptides of the purified 
protein were separated by reverse phase HPLC (C8 column) and 
homogeneous peaks were subjected to amino acid sequence analysis. 

The purified enzyme was applied to reverse phase HPLC and 
subjected to N-terminal amino acid sequencing using the amino acid 
io sequencer (Applied Biosystems). 

Cells: Cultures of bovine corneal endothelial cells (BCECs) were 
established from steer eyes as previously described (19, 38). Stock cultures 
were maintained in DMEM (1 g glucose/liter) supplemented with 10 % 
newborn calf serum and 5 % FCS. bFGF (1 ng/ml) was added every other 
15 day during the phase of active cell growth (13, 14). 

Preparation of dishes coated with ECM: BCECs (second to fifth 
passage) were plated into 4-well plates at an initial density of 2 x 10 5 
cells/ml, and cultured in sulfate-free Fisher medium plus 5 % dextran T-40 
for 12 days. Na 2 35 S0 4 (25 uCi/ml) was added on day 1 and 5 after seeding 
20 and the cultures were incubated with the label without medium change. The 
subendothelial ECM was exposed by dissolving (5 min., room temperature) 
the cell layer with PBS containing 0.5 % Triton X-100 and 20 mM NH 4 OH, 
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followed by four washes with PBS. The ECM remained intact, free of 
cellular debris and firmly attached to the entire area of the tissue culture 
dish (19, 22). 

* 

To prepare soluble sulfate labeled proteoglycans (peak I material), 
the ECM was digested with trypsin (25 |ag/ml, 6 h, 37 °C ), the digest was 
concentrated by reverse dialysis and the concentrated material was applied 
onto a Sepharose 6B gel filtration column. The resulting high molecular 
weight material (Kav< 0.2, peak I) was collected. More than 80 % of the 
labeled material was shown to be composed of heparan sulfate 
proteoglycans (1 1, 39). 

Heparanase activity: Cells (1 x 10 6 /35-mm dish), cell lysates or 
conditioned media were incubated on top of 35 S-labeled ECM (18 h, 37 °C) 
in the presence of 20 mM phosphate buffer (pH 6.2). Cell lysates and 
conditioned media were also incubated with sulfate labeled peak I material 
(10-20 pi). The incubation medium was collected, centrifuged (18,000 x g, 
4 °C, 3 min.), and sulfate labeled material analyzed by gel filtration on a 
Sepharose CL-6B column (0.9 x 30 cm). Fractions (0.2 ml) were eluted 
with PBS at a flow rate of 5 ml/h and counted for radioactivity using Bio- 
fluor scintillation fluid. The excluded volume (V 0 ) was marked by blue 
dextran and the total included volume (Vt) by phenol red. The latter was 
shown to comigrate with free sulfate (7, 11, 23). Degradation fragments of 
HS side chains were eluted from Sepharose 6B at 0.5 < Kav < 0.8 (peak II) 



75 

(7, 1 1, 23). A nearly intact HSPG released from ECM by trypsin - and, to a 
lower extent, during incubation with PBS alone - was eluted next to V 0 
(Kav < 0.2, peak I). Recoveries of labeled material applied on the columns 
ranged from 85 to 95 % in different experiments (11). Each experiment was 
performed at least three times and the variation of elution positions (Kav 
values) did not exceed +/- 15 %. 

Cloning of hpa cDNA: cDNA clones 257548 and 260138 were 
obtained from the I.M.A.G.E Consortium (2130 Memorial Parkway SW, 
Hunstville, AL 35801). The cDNAs were originally cloned in EcoKl and 
Not! cloning sites in the plasmid vector pT3T7D-Pac. Although these 
clones are reported to be somewhat different, DNA sequencing 
demonstrated that these clones are identical to one another. Marathon 
RACE (rapid amplification of cDNA ends) human placenta (poly- A) cDNA 
composite was a gift of Prof. Yossi Shiloh of Tel Aviv University. This 
composite is vector free, as it includes reverse transcribed cDNA fragments 
to which double, partially single stranded adapters are attached on both 
sides. The construction of the specific composite employed is described in 
reference 39a. 

Amplification of hp3 PCR fragment was performed according to the 
protocol provided by Clontech laboratories. The template used for 
amplification was a sample taken from the above composite. The primers 
used for amplification were: 
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First step: 5'-primer: API: 5-CCATCCTAATACGACTCACT 
ATAGGGC-3', SEQ ID NO:l; 3'-primer: HPL229: 5-GTAGTGATGCCA 
TGTAACTGAATC-3', SEQ ID NO:2. 

Second step: nested S'-primer: AP2: 5'-ACTCACTATAGGGCTCG 
AGCGGC-3', SEQ ID NO:3; nested 3'- primer: HPL171: 5'- 
GCATCTTAGCCGTCTTTCTTCG-3 1 , SEQ ID NO:4. The HPL229 and 
HPL171 were selected according to the sequence of the EST clones. They 
include nucleotides 933-956 and 876-897 of SEQ ID NO:9, respectively. 

PCR program was 94 °C - 4 min., followed by 30 cycles of 94 °C - 
40 sec, 62 °C - 1 min., 72 °C - 2.5 min. Amplification was performed with 
Expand High Fidelity (Boehringer Mannheim). The resulting ca. 900 bp 
hp3 PCR product was digested with Bfrl and Pvull. Clone 257548 (phpal) 
was digested with EcoRl, followed by end filling and was then further 
digested with Bfrl. Thereafter the Pvull - Bfrl fragment of the hp3 PCR 
product was cloned into the blunt end - Bfrl end of clone phpal which 
resulted in having the entire cDNA cloned in pT3T7-pac vector, designated 
phpa2. 

RT-PCR: RNA was prepared using TRI-Reagent (Molecular 
research center Inc.) according to the manufacturer instructions. 1.25 ug 
were taken for reverse transcription reaction using MuMLV Reverse 
transcriptase (Gibco BRL) and Oligo (dT)i5 primer, SEQ ID NO: 5, 
(Promega). Amplification of the resultant first strand cDNA was 



77 

performed with Taq polymerase (Promega). The following primers were 
used: 

HPU-355: 5'-TTCGATCCCAAGAAGGAATCAAC-3', SEQIDNO:6, 
nucleotides 372-394 in SEQ ID NOs:9 or 1 1. 

HPL-229: 5'-GTAGTGATGCCATGTAACTGAATC-3', SEQ ID NO:7, 
nucleotides 933-956 in SEQ ID NOs:9 or 1 1. 

PCR program: 94 °C - 4 min., followed by 30 cycles of 94 °C - 40 
sec, 62 °C - 1 min., 72 °C - 1 min. 

Alternatively, total RNA was prepared from cell cultures using Tri- 
reagent (Molecular Research Center, Inc.) according to the manufacturer 
recommendation. Poly A+ RNA was isolated from total RNA using mRNA 
separator (Clontech). Reverse transcription was performed with total RNA 
using Superscript II (GibcoBRL). PCR was performed with Expand high 
fidelity (Boehringer Mannheim). Primers used for amplification were as 
follows: 

Hpu-685, 5 ' -GAGCAGCC AGGTGAGCCC AAGAT-3 ' , SEQ ID NO:24 
Hpu-355, 5 ' -TTCGATCCCAAGAAGGAATCAAC-3 ', SEQ ID NO:25 
Hpu 565, 5 ' - AGCTCTGTAGATGTGCTATAC AC-3 ' , SEQ ID NO:26 
Hpl 967, 5'-TCAGATGCAAGCAGCAACTTTGGC-3', SEQ ID NO:27 
Hpl 171, 5 '-GCATCTTAGCCGTCTTTCTTCG-3 ' , SEQ ID NO:28 
Hpl 229, 5 ' -GTAGTGATGCC ATGTAACTGAATC-3 ' , SEQ ID NO:29 
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PCR reaction was performed as follows: 94 °C 3 minutes, followed 
by 32 cycles of 94 °C 40 seconds, 64 °C 1 minute, 72 °C 3 minutes, and one 

cycle 72 °C, 7 minutes. 

Expression of recombinant heparanase in insect cells: Cells, High 
Five and Sf21 insect cell lines were maintained as monolayer cultures in 
SF900II-SFM medium (GibcoBRL). 

Recombinant Baculovirus: Recombinant virus containing the hpa 
gene was constructed using the Bac to Bac system (GibcoBRL). The 
transfer vector pFastBac was digested with Sail and Notl and ligated with a 
1.7 kb fragment of p/*pa2 digested with Xhol and Notl. The resulting 
plasmid was designated pFast/*ptf2. An identical plasmid designated 
pFast£p<z4 was prepared as a duplicate and both independently served for 
further experimentations. Recombinant bacmid was generated according to 
the instructions of the manufacturer with pFast/zpa2, P Fast/zpa4 and with 
pFastBac. The latter served as a negative control. Recombinant bacmid 
DNAs were transfected into Sf21 insect cells. Five days after transfection 
recombinant viruses were harvested and used to infect High Five insect 
cells, 3 x 10 6 cells in T-25 flasks. Cells were harvested 2 - 3 days after 
infection. 4 x 10 6 cells were centrifuged and resuspended in a reaction 
buffer containing 20 mM phosphate citrate buffer, 50 mM NaCl. Cells 
underwent three cycles of freeze and thaw and lysates were stored at -80 
°C. Conditioned medium was stored at 4 °C. 
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Partial purification of recombinant heparanase: Partial 
purification of recombinant heparanase was performed by heparin- 
Sepharose column chromatography followed by Superdex 75 column gel 
filtration. Culture medium (150 ml) of Sf21 cells infected with pFhpa4 
virus was subjected to heparin-Sepharose chromatography. Elution of 1 ml 
fractions was performed with 0.35 - 2 M NaCl gradient in presence of 0.1 % 
CHAPS and 1 mM DTT in 10 mM sodium acetate buffer, pH 5.0. A 25 ul 
sample of each fraction was tested for heparanase activity. Heparanase 
activity was eluted at the range of 0.65 - 1 . 1 M NaCl (fractions 1 8-26, 
Figure 10a). 5 ul of each fraction was subjected to 15 % SDS- 
polyacrylamide gel electrophoresis followed by silver nitrate staining. 
Active fractions eluted from heparin-Sepharose (Figure 10a) were pooled 
and concentrated (x 6) on YM3 cut-off membrane. 0.5 ml of the 
concentrated material was applied onto 30 ml Superdex 75 FPLC column 
equilibrated with 10 mM sodium acetate buffer, pH 5.0, containing 0.8 M 
NaCl, 1 mM DTT and 0.1 % CHAPS. Fractions (0.56 ml) were collected at 
a flow rate of 0.75 ml/min. Aliquots of each fraction were tested for 
heparanase activity and were subjected to SDS-polyacrylamide gel 
electrophoresis followed by silver nitrate staining (Figure lib). 

PCR amplification of genomic DNA: 94 °C 3 minutes, followed by 
32 cycles of 94 °C 45 seconds, 64 °C 1 minute, 68 °C 5 minutes, and one 
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cycle at 72 °C, 7 minutes. Primers used for amplification of genomic DNA 



included: 



GHpu-L3 5 ' - AGGC ACCCTAG AGATGTTCC AG-3 ' , SEQ ID NO:30 
GHpl-L6 5'-GAAGATTTCTGTTTCCATGACGTG-3\ SEQ ID NO:31. 

Screening of genomic libraries: A human genomic library in 
Lambda phage EMBLE3 SP6/T7 (Clontech, Paulo Alto, CA) was screened. 
5 x 105 plaques were plated at 5 x 10* pfu/plate on NZCYM agar/top 
agarose plates. Phages were absorbed on nylon membranes in duplicates 
(Qiagen). Hybridization was performed at 65 °C in 5 x SSC, 5 x Denhart's, 
10 % dextran sulfate, 100 jig/ml Salmon sperm, 32 p labeled probe (10$ 
cpm/ml). A 1 .6 kb fragment, containing the entire hpa cDNA was labeled 
by random priming (Boehringer Mannheim). Following hybridization 
membranes were washed once with 2 x SSC, 0.1 % SDS at 65 °C for 20 
minutes, and twice with 0.2 x SSC, 0.1 % SDS at 65 °C for 15 minutes. 
Hybridizing plaques were picked, and plated at 100 pfu/plate. 
Hybridization was performed as above and single isolated positive plaques 
were picked. 

Phage DNA was extracted using a Lambda DNA extraction kit 
(Qiagen). DNA was digested with Xhol and EcdRl, separated on 0.7 % 
agarose gel and transferred to nylon membrane Hybond N+ (Amersham). 
Hybridization and washes were performed as above. 
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cDNA Sequence analysis: Sequence determinations were performed 
with vector specific and gene specific primers, using an automated DNA 
sequencer (Applied Biosystems, model 373A). Each nucleotide was read 
from at least two independent primers. 
5 Genomic sequence analysis: Large-scale sequencing was performed 

by Commonwealth Biotechnology Incorporation. 

Isolation of mouse hpa: Mouse hpa cDNA was amplified from 
either Marathon ready cDNA library of mouse embryo or from mRNA 
isolated from mouse melanoma cell line BL6, using the Marathon RACE kit 
10 from Clontech. Both procedures were performed according to the 
manufacturer's recommendation. 

Primers used for PCR amplification of mouse hpa: 
Mhpl773 5 ' -CC AC ACTGAATGTAATACTGAAGTG-3 ' , SEQ ID NO:32 
MH P 1736 5' -CGAAGCTCTGGAACTCGGCAAG-3 , SEQIDNO:33 
,5 MH P 183 5 ' -GCC AGCTGC AAAGGTGTTGGAC-3 ' , SEQ ID NO:34 
Mhpll52 5'-AACACCTGCCTCATCACGACTTC-3\ SEQ IDNO:35 
MhpllH 5'-GCCAGGCTGGCGTCGATGGTGA-3' , SEQ IDNO.36 
MHpll03 5 ' -GTCGATGGTG ATGG AC AGGAAC-3 , SEQ IDNO:37 
Apl 5 ' -GTAATACGACTC ACTATAGGGC-3 ' , SEQ ID NO:38 - 

20 (Genome walker) 

Ap2 5 ' - ACTAT AGGGC ACGCGTGGT-3 ' , SEQ ID NO:39 - 

(Genome walker) 
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Apl 5 '-CCATCCTAATACGACTCACTATAGGGC-3 SEQ ID NO:40 - 
(Marathon RACE) 

Ap2 5 ' - ACTC ACTATAGGGCTCGAGCGGC-3 ' , SEQ ID NO:41 - 
(Marathon RACE) 

Southern analysis of genomic DNA: Genomic DNA was extracted 
from animal or from human blood using Blood and cell culture DNA maxi 
kit (Qiagene). DNA was digested with EcoKl, separated by gel 
electrophoresis and transferred to a nylon membrane Hybond N+ 
(Amersham). Hybridization was performed at 68 °C in 6 x SSC, 1 % SDS, 
5 x Denharts, 10 % dextran sulfate, 100 |ag/ml salmon sperm DNA, and 32 p 
labeled probe. A 1.6 kb fragment, containing the entire hpa cDNA was 
used as a probe. Following hybridization, the membrane was washed with 3 
x SSC, 0.1 % SDS, at 68 °C and exposed to X-ray film for 3 days. 
Membranes were then washed with 1 x SSC, 0.1 % SDS, at 68 °C and were 
reexposed for 5 days. 

Construction of hpa promoter-GFP expression vector: Lambda 
DNA of phage L3, was digested with Sacl and BgRI, resulting in a 1712 bp 
fragment which contained the hpa promoter (877-2688 of SEQ ID NO:42). 
The pEGFP-1 plasmid (Clontech) was digested with BgUl and Sacl and 
ligated with the 1712 bp fragment of the hpa promoter sequence. The 
resulting plasmid was designated phpEGL. A second hpa promoter-GFP 
plasmid was constructed containing a shorter fragment of the hpa promoter 
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region: phpEGL was digested with Hindlll, and the resulting 1095 bp 
fragment (nucleotides 1593-2688 of SEQ ID NO:42) was ligated with 
Hindlll digested pEGFP-1 . The resulting plasmid was designated phpEGS. 

Computer analysis of sequences: Homology searches were 
performed using several computer servers, and various databases. Blast 2.0 
service, at the NCBI server was used to screen the protein database swplus 
and DNA databases such as GenBank, EMBL, and the EST databases. 
Blast 2.0 search was performed using the basic search option of the NCBI 
server. Sequence analysis and alignments were done using the DNA 
sequence analysis software package developed by the Genetic Computer 
Group (GCG) at the university of Wisconsin. Alignments of two sequences 
were performed using Bestfit (gap creation penalty - 12, gap extension 
penalty - 4). Protein homology search was performed with the Smith- 
Waterman algorithm, using the Bioaccelerator platform developed by 
Compugene. The protein database swplus was searched using the following 
parameters: gapop: 10.0, gapext: 0.5, matrix: blosum62. Blocks homology 
was performed using the Blocks WWW server developed at Fred 
Hutchinson Cancer Research Center in Seattle, Washington, USA. 
Secondary structure prediction was performed using the PHD server - 
Profile network Prediction Heidelberg. Fold recognition (threading) was 
performed using the UCLA-DOE structure prediction server. The method 
used for prediction was gonnet+predss. Alignment of three sequences was 
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performed using the pileup application (gap creation penalty - 5, gap 
extension penalty - 1). Promoter analysis was performed using TSSW and 
TSSG programs (BCM Search Launcher Human Genome Center, Baylor 
College of Medicine, Houston TX). 

EXAMPLE 1 
Cloning of human hpa cDNA 

Purified fraction of heparanase isolated from human hepatoma cells 
(SK-hep-1) was subjected to tryptic digestion and microsequencing. EST 
(Expressed Sequence Tag) databases were screened for homology to the 
back translated DNA sequences corresponding to the obtained peptides. 
Two EST sequences (accession Nos. N41349 and N45367) contained a 
DNA sequence encoding the peptide YGPDVGQPR (SEQ ID NO:8). 
These two sequences were derived from clones 257548 and 260138 
(I.M.A.G.E Consortium) prepared from 8 to 9 weeks placenta cDNA library 
(Soares). Both clones which were found to be identical contained an insert 
of 1020 bp which included an open reading frame (ORF) of 973 bp 
followed by a 3' untranslated region of 27 bp and a Poly A tail. No 
translation start site (AUG) was identified at the 5' end of these clones. 

Cloning of the missing 5' end was performed by PCR amplification 
of DNA from a placenta Marathon RACE cDNA composite. A 900 bp 
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fragment (designated hp3), partially overlapping with the identified 3' 
encoding EST clones was obtained. 

The joined cDNA fragment, 1721 bp long (SEQ ID NO:9), contained 
an open reading frame which encodes, as shown in Figure 1 and SEQ ID 
NO:ll, a polypeptide of 543 amino acids (SEQ ID NO: 10) with a calculated 
molecular weight of 61,192 daltons. The 3' end of the partial cDNA inserts 
contained in clones 257548 and 260138 started at nucleotide G 721 of SEQ 

ID NO:9 and Figure 1 . 

As further shown in Figure 1, there was a single sequence 
discrepancy between the EST clones and the PCR amplified sequence, 
which led to an amino acid substitution from Tyr 246 in the EST to Phe 246 in 
the amplified cDNA. The nucleotide sequence of the PCR amplified cDNA 
fragment was verified from two independent amplification products. The 
new gene was designated hpa. 

As stated above, the 3' end of the partial cDNA inserts contained in 
EST clones 257548 and 260138 started at nucleotide 721 of hpa (SEQ ID 
NO:9). The ability of the hpa cDNA to form stable secondary structures, 
such as stem and loop structures involving nucleotide stretches in the 
vicinity of position 721 was investigated using computer modeling. It was 
found that stable stem and loop structures are likely to be formed involving 
nucleotides 698-724 (SEQ ID NO:9). In addition, a high GC content, up to 
70 %, characterizes the 5' end region of the hpa gene, as compared to about 
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only 40 % in the 3' region. These findings may explain the immature 
termination and therefore lack of 5' ends in the EST clones. 

To examine the ability of the hpa gene product to catalyze 
degradation of heparan sulfate in an in vitro assay the entire open reading 
5 frame was expressed in insect cells, using the Baculovirus expression 
system. Extracts of cells, infected with virus containing the hpa gene, 
demonstrated a high level of heparan sulfate degradation activity, while 
cells infected with a similar construct containing no hpa gene had no such 
activity, nor did non-infected cells. These results are further demonstrated 
10 in the following Examples. 



EXAMPLE 2 



Degradation of soluble ECM-derived HSPG 
Monolayer cultures of High Five cells were infected (72 h, 28 °C) 
is with recombinant Bacoluvirus containing the pFastipa plasmid or with 
control virus containing an insert free plasmid. The cells were harvested 
and lysed in heparanase reaction buffer by three cycles of freezing and 
thawing. The cell lysates were then incubated (18 h, 37 °C) with sulfate 
labeled, ECM-derived HSPG (peak I), followed by gel filtration analysis 

20 (Sepharose 6B) of the reaction mixture. 

As shown in Figure 2, the substrate alone included almost entirely 
high molecular weight (Mr) material eluted next to V G (peak I, fractions 5- 
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20, Kav < 0.35). A similar elution pattern was obtained when the HSPG 
substrate was incubated with lysates of cells that were infected with control 
virus. In contrast, incubation of the HSPG substrate with lysates of cells 
infected with the hpa containing virus resulted in a complete conversion of 
the high Mr substrate into low Mr labeled degradation fragments (peak II, 

fractions 22-35, 0.5 < Kav < 0.75). 

Fragments eluted in peak II were shown to be degradation products 
of heparan sulfate, as they were (i) 5- to 6-fold smaller than intact heparan 
sulfate side chains (Kav approx. 0.33) released from ECM by treatment with 
either alkaline borohydride or papain; and (ii) resistant to further digestion 
with papain or chondroitinase ABC, and susceptible to deamination by 
nitrous acid (6, 1 1). Similar results (not shown) were obtained with 

Sf21 cells. Again, heparanase activity was detected in cells infected with 
the hpa containing virus (pVhpa), but not with control virus (pF). This 
result was obtained with two independently generated recombinant viruses. 
Lysates of control not infected High Five cells failed to degrade the HSPG 
substrate. 

In subsequent experiments, the labeled HSPG substrate was 
incubated with medium conditioned by infected High Five or Sf21 cells. 

As shown in Figures 3a-b, heparanase activity, reflected by the 
of the high Mr peak I substrate into the low Mr peak II which 
represents HS degradation fragments, was found in the culture medium of 



conversion 
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cells infected with the p¥hpa2 or pFhpa4 viruses, but not with the control 
pFl or pF2 viruses. No heparanase activity was detected in the culture 
medium of control non-infected High Five or Sf2 1 cells. 

The medium of cells infected with the p¥hpa4 virus was passed 
through a 50 kDa cut off membrane to obtain a crude estimation of the 
molecular weight of the recombinant heparanase enzyme. As demonstrated 
in Figure 4, all the enzymatic activity was retained in the upper 
compartment and there was no activity in the flow through (<50 kDa) 
material. This result is consistent with the expected molecular weight of the 

hpa gene product. 

In order to further characterize the hpa product the inhibitory effect 
of heparin, a potent inhibitor of heparanase mediated HS degradation (40) 
was examined. 

As demonstrated in Figures 5a-b, conversion of the peak I substrate 
into peak II HS degradation fragments was completely abolished in the 

presence of heparin. 

Altogether, these results indicate that the heparanase enzyme is 
expressed in an active form by insect cells infected with Baculovirus 
containing the newly identified human hpa gene. 



89 



EXAMPLE 3 



viruses, 



Degradation ofHSPG in intact ECM 

Next, the ability of intact infected insect cells to degrade HS in 
intact, naturally produced ECM was investigated. For this purpose, High 
Five or S£21 cells were seeded on metabolically sulfate labeled ECM 
followed by infection (48 h, 28 °C) with either the p¥hpa4 or control pF2 
The pH of the medium was then adjusted to pH 6.2-6.4 and the 
cells further incubated with the labeled ECM for another 48 h at 28 °C or 24 
h at 37 °C. Sulfate labeled material released into the incubation medium 
was analyzed by gel filtration on Sepharose 6B. 

As shown in Figures 6a-b and 7a-b, incubation of the ECM with cells 
infected with the control pF2 virus resulted in a constant release of labeled 
material that consisted almost entirely (>90%) of high Mr fragments (peak 
I) eluted with or next to V G . It was previously shown that a proteolytic 
activity residing in the ECM itself and/or expressed by cells is responsible 
for release of the high Mr material (6). This nearly intact HSPG provides a 
soluble substrate for subsequent degradation by heparanase, as also 
indicated by the relatively large amount of peak I material accumulating 
when the heparanase enzyme is inhibited by heparin (6, 7, 12, Figure 9). On 
the other hand, incubation of the labeled ECM with cells infected with the 
p¥hpa4 virus resulted in release of 60-70% of the ECM-associated 
radioactivity in the form of low Mr sulfate-labeled fragments (peak II, 0.5 
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<Kav< 0.75), regardless of whether the infected cells were incubated with 
the ECM at 28 °C or 37 °C. Control intact non-infected SGI or High Five 
cells failed to degrade the ECM HS side chains. 

In subsequent experiments, as demonstrated in Figures 8a-b, High 
Five and S£21 cells were infected (96 h, 28 °C) with pFhpa* or control pFl 
viruses and the culture medium incubated with sulfate-labeled ECM. Low 
Mr HS degradation fragments were released from the ECM only upon 
incubation with medium conditioned by pFhpaA infected cells. As shown in 
Figure 9, production of these fragments was abolished in the presence of 
heparin. No heparanase activity was detected in the culture medium of 
control, non-infected cells. These results indicate that the heparanase 
enzyme expressed by cells infected with the p¥hpa4 virus is capable of 
degrading HS when complexed to other macromolecular constituents (i.e. 
fibronectin, laminin, collagen) of a naturally produced intact ECM, in a 
manner similar to that reported for highly metastatic tumor cells or activated 
cells of the immune system (6, 7). 



EXAMPLE 4 
Purification of recombinant human heparanase 

The recombinant heparanase was partially purified from medium of 
pFhpaA infected Sf21 cells by Heparin-Sepharose chromatography (Figure 
10a) followed by gel filtration of the pooled active fractions over an FPLC 



91 



Superdex 75 column (Figure 11a). A - 63 kDa protein was observed, 
whose quantity, as was detected by silver stained SDS-polyacrylamide gel 
electrophoresis, correlated with heparanase activity in the relevant column 
fractions (Figures 10b and 1 lb, respectively). This protein was not detected 
in the culture medium of cells infected with the control P F1 virus and was 
subjected to a similar fractionation on heparin-Sepharose (not shown). 



EXAMPLE 5 



ion of the human hpa cDNA in various cell types, organs and 



Expressio 

tissues 



Referring now to Figures 12a-e, RT-PCR was applied to evaluate the 
expression of the hpa gene by various cell types and tissues. For this 
purpose, total RNA was reverse transcribed and amplified. The expected 
585 bp long cDNA was clearly demonstrated in human kidney, placenta (8 
5 and . 1 weeks) and mole tissues, as well as in freshly isolated and short 
termed (1.5-48 h) cultured human placental cytotrophoblastic cells (Figure 
12a), all known to express a high heparanase activity (41). The hpa 
transcript was also expressed by normal human neutrophils (Figure 12b). In 
contrast, there was no detectable expression of the hpa mRNA in embtyonic 
20 human muscle tissue, thymus, hear, and adrenal (Figure 12b). The hpa gene 
was expressed by several, but not all, human bladder carcinoma cell lines 
(Figure 12c), SK hepatoma (SK-hep-1), ovarian carcinoma (OV 1063), 
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breast carcinoma (435, 231), melanoma and megakaryocyte (DAMI, 
CHRF) human cell lines (Figures 12d-e). 

The above described expression pattern of the hpa transcript was 
determined to be in a very good correlation with heparanase activity levels 
determined in various tissues and cell types (not shown). 



EXAMPLE 6 

Isolation of an extended 5' end of hpa cDNAfrom human SK-hepl cell 

line 

[ 0 The 5' end of hpa cDNA was isolated from human SK-hepl cell line 

by PCR amplification using the Marathon RACE (rapid amplification of 
cDNA ends) kit (Clontech). Total RNA was prepared from SK-hepl cells 
using the TRI-Reagent (Molecular research center Inc.) according to the 
manufacturer instructions. Poly A+ RNA was isolated using the mRNA 

1 5 separator kit (Clonetech). 

The Marahton RACE SK-hepl cDNA composite was constructed 
according to the manufacturer recommendations. First round of 
amplification was performed using an adaptor specific primer API: 5'- 
CCATCCTAATACG ACTCACTATAGGGC-3', SEQ ID NO:l, and a hpa 

20 specific antisense primer hpl-629: 5'-CCCCAGGAGCAGCAGCATCAG- 
3', SEQ ID NO: 17, corresponding to nucleotides 119-99 of SEQ ID NO:9. 
The resulting PCR product was subjected to a second round of amplification 
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using an adaptor specific nested primer AP2: 5'- 
ACTCACTATAGGGCTCGAGCGGC-3 ' , SEQ ID NO:3, and a hpa 
specific antisense nested primer hpl-666 5*- 

AGGCTTCGAGCGCAGCAGCAT-3', SEQ ID NO: 18, corresponding to 
nucleotides 83-63 of SEQ ID NO:9. The PCR program was as follows: a 
hot start of 94 °C for 1 minute, followed by 30 cycles of 90 °C - 30 seconds, 
68 °C - 4 minutes. The resulting 300 bp DNA fragment was extracted from 
an agarose gel and cloned into the vector pGEM-T Easy (Promega). The 
resulting recombinant plasmid was designated pHPSKl . 

The nucleotide sequence of the pHPSKl insert was determined and it 
was found to contain 62 nucleotides of the 5' end of the placenta hpa cDNA 
(SEQ ID NO:9) and additional 178 nucleotides upstream, the first 178 
nucleotides of SEQ ID NOs:13 and 15. 

A single nucleotide discrepancy was identified between the SK-hep 1 
cDNA and the placenta cDNA. The "T" derivative at position 9 of the 
placenta cDNA (SEQ ID NO:9), is replaced by a "C" derivative at the 
corresponding position 187 of the SK-hep 1 cDNA (SEQ ID NO: 13). 

The discrepancy is likely to be due to a mutation at the 5' end of the 
placenta cDNA clone as confirmed by sequence analysis of sevsral 
additional cDNA clones isolated from placenta, which like the SK-hep 1 
cDNA contained C at position 9 of SEQ ID NO:9. 



94 

The 5' extended sequence of the SK-hepl hpa cDNA was assembled 
with the sequence of the hpa cDNA isolated from human placenta (SEQ ID 
NO:9). The assembled sequence contained an open reading frame which 
encodes, as shown in SEQ ID NOs:14 and 15, a polypeptide of 592 amino 
acids with a calculated molecular weight of 66,407 daltons. The open 
reading frame is flanked by 93 bp 5' untranslated region (UTR). 

EXAMPLE 7 

Isolation of the upstream genomic region of the hpa gene 
The upstream region of the hpa gene was isolated using the Genome 
Walker kit (Clontech) according to the manufacturer recommendations. 
The kit includes five human genomic DNA samples each digested with a 
different restriction endonuclease creating blunt ends: EcoKV, Seal, Dral, 
Pvull and Sspl. 

The blunt ended DNA fragments are ligated to partially single 
stranded adaptors. The Genomic DNA samples were subjected to PCR 
amplification using the adaptor specific primer and a gene specific primer. 
Amplification was performed with Expand High Fidelity (Boehringer 
Mannheim). 

A first round of amplification was performed using the apl primer: 
5'-G TAATACGACTCACTATAGGGC-3', SEQ ID NO: 19, and the hpa 
specific antisense primer hpl-666: 5'-AGGCTTCGAGCGCAGCAGCAT- 
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3', SEQ ID NO: 18, corresponding to nucleotides 83 - 63 of SEQ ID NO:9. 
The PCR program was as follows: a hot start of 94 °C - 3 minutes, followed 
by 36 cycles of 94 °C - 40 seconds, 67 °C - 4 minutes. 

The PCR products of the first amplification were diluted 1:50. One 
ul of the diluted sample was used as a template for a second amplification 
using a nested adaptor specific primer ap2: 5*- 
ACTATAGGGCACGCGTGGT-3', SEQ ID NO:20, and a hpa specific 
antisense primer hpl-690, 5'-CTTGGGCTCACC TGGCTGCTC-3', SEQ ID 
NO:21, corresponding to nucleotides 62-42 of SEQ ID NO:9. The resulting 
amplification products were analyzed using agarose gel electrophoresis. 
Five different PCR products were obtained from the five amplification 
reactions. A DNA fragment of approximately 750 bp which was obtained 
from the Sspl digested DNA sample was gel extracted. The purified 
fragment was ligated into the plasmid vector pGEM-T Easy (Promega). 
The resulting recombinant plasmid was designated pGHP6905 and the 
nucleotide sequence of the hpa insert was determined. 

A partial sequence of 594 nucleotides is shown in SEQ ID NO: 16. 
The last nucleotide in SEQ ID NO: 13 corresponds to nucleotide 93 in SEQ 
ID:13. The DNA sequence in SEQ ID NO:16 contains the 5' region of the 
hpa cDNA and 501 nucleotides of the genomic upstream region which are 
predicted to contain the promoter region of the hpa gene. 
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EXAMPLE 8 



Expression of the 592 amino acids HPA polypeptide in a human 293 cell 



line 



The 592 amino acids open reading frame (SEQ ID NOs:13 and 15) 
was constructed by ligation of the 1 10 bp corresponding to the 5' end of the 
SK-hepl hpa cDNA with the placenta cDNA. More specifically the 
Marathon RACE - PCR amplification product of the placenta hpa DNA was 
digested with Sad and an approximately 1 kb fragment was ligated into a 
Sacl-digested pGHP6905 plasmid. The resulting plasmid was digested with 
Earl and AaBL The Earl sticky ends were blunted and an approximately 
280 bp EarVbtoA-Aato fragment was isolated. This fragment was ligated 
with pFastfpa digested with EcoRI which was blunt ended using Klenow 
fragment and further digested with Aatll. The resulting plasmid contained a 
1827 bp insert which includes an open reading frame of 1776 bp, 31 bp of 
3' UTR and 21 bp of 5' UTR. This plasmid was designated pFastL/zpa. 

A mammalian expression vector was constructed to drive the 
expression of the 592 amino acids heparanase polypeptide in human cells. 
The hpa cDNA was excised prom pFastL/zpa with BssHR and Notl. The 
resulting 1850 bp BssWL-NofL fragment was ligated to a mammalian 
) expression vector pSI (Promega) digested with Mul and Notl. The 
resulting recombinant plasmid, P SI^aMet2 was transfected into a human 
293 embryonic kidney cell line. 
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Transient expression of the 592 amino-aeids heparanase was 
examined by western Mat analysis and the enzymatic activi* was tested 
using the gel shift assay. Both these procedures are described in length in 
US. Pat. application No. 09/071,739, filed May 1, 1998, which is 
incorporated by reference as if fully set for* herein. Cells were harvested 3 
days following transfection. Harvested cells were re-suspended in lysis 
buffer containing 150 mM NaCl, 50 mM Tris pH 7.5, 1% Triton X-100. 1 
mM PMSF and protease inhibitor cocktail (Boehringer Mannheim). 40 ug 
protein extract samples were used for separation on a SDS-PAGE. Proteins 
0 were transferred onto a PVDF Hybond-P membrane (Amersham). The 
membrane was incubated with an affinity purified polyclonal anti 
heparanase antibody, as described in U.S. Pat. application No. 09/071,739. 
A major band of approximately 50 kDa was observed in the transfected 
cells as well as a minor band of approximately 65 kDa. A similar pattern 
15 was observed in extracts of cells transfected with the pS*» as 
demonstrated in U.S. Pat. application No. 09/071,739. These two bands 
probably represent two forms of the recombinant heparanase protein 
produced by the transfected cel.s. The 65 kDa protein probably represents a 
heparanase precursor, while the 50 kDa protein is suggested herein to be the 

20 processed or mature form. 

The catalytic activity of the recombinant protein expressed in the 
pS/^MeC transfected ce.ls was tested by gel shift assay. Cell extracts of 
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transfected and of mock transfected cells were incubated overnight with 
heparin (6 \xg in each reaction) at 37 °C, in the presence of 20 mM 
phosphate citrate buffer pH 5.4, 1 mM CaCl 2 , 1 mM DTT and 50 mM 
NaCl. Reaction mixtures were then separated on a 10 % polyacrylamide 
gel. The catalytic activity of the recombinant heparanase was clearly 
demonstrated by a faster migration of the heparin molecules incubated with 
the transfected cell extract as compared to the control. Faster migration 
indicates the disappearance of high molecular weight heparin molecules and 
the generation of low molecular weight degradation products. 



EXAMPLE 9 
Chromosomal localization of the hpa gene 
Chromosomal mapping of the hpa gene was performed utilizing a 
panel of monochromosomal human/CHO and human/mouse somatic cell 
hybrids, obtained from the UK HGMP Resource Center (Cambridge, 
England). 

40 ng of each of the somatic cell hybrid DNA samples were 
subjected to PCR amplification using the hpa primers: hpu565 5'- 
AGCTCTGTAGATGTGC TATACAC-3', SEQ ID NO:22, corresponding 
to nucleotides 564-586 of SEQ ID NO:9 and an antisense primer hpll71 5'- 
GCATCTTAGCCGTCTTTCTTCG-3', SEQ ID NO:23, corresponding to 
nucleotides 897-876 of SEQ ID NO:9. 
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The PCR program was as follows: a hot start of 94 °C - 3 minutes, 
followed by 7 cycles of 94 °C - 45 seconds, 66 °C - 1 minute, 68 °C - 5 
minutes, followed by 30 cycles of 94 °C - 45 seconds, 62 °C - 1 minute, 68 
°C - 5 minutes, and a 10 minutes final extension at 72 °C. 
5 The reactions were performed with Expand long PCR (Boehringer 

Mannheim). The resulting amplification products were analyzed using 
agarose gel electrophoresis. As demonstrated in Figure 14, a single band of 
approximately 2.8 Kb was obtained from chromosome 4, as well as from the 
control human genomic DNA. A 2.8 kb amplification product is expected 
io based on amplification of the genomic hpa clone (data not shown). No 
amplification products were obtained neither in the control DNA samples of 
hamster and mouse nor in somatic hybrids of other human chromosome. 



EXAMPLE 10 

1 5 Human genomic clone encoding heparanase 

Five plaques were isolated following screening of a human genomic 
library and were designated L3-1, L5-1, L8-1, L10-1 and L6-1. The phage 
DNAs were analyzed by Southern hybridization and by PCR with hpa 
specific and vector specific primers. Southern analysis was performed with 

20 three fragments of hpa cDNA: a Pvull-BamUl fragment (nucleotides 32- 
450, SEQ ID NO:9), a BamHl-Ndel fragment (nucleotides 451-1102, SEQ 
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ID NO:9) and an Ndel-Xhol fragment (nucleotides 1103-1721, SEQ ID 
NO:9). 

Following Southern analysis, phages L3, L6, L8 were selected for 
further analysis. A scheme of the genomic region and the relative position 
of the three phage clones is depicted in Figure 15. A 2 kb DNA fragment 
containing the gap between phages L6 and L3 was PCR amplified from 
human genomic DNA with two gene specific primers GHpuL3 and 
GHplL6. The PCR product was cloned into the plasmid vector pGEM-T- 
easy (Promega). 

Large scale DNA sequencing of the three Lambda clones and the 
amplified fragment was performed with Lambda purified DNA by primer 
walking. A nucleotide sequence of 44,898 bp was analyzed (Figure 16, 
SEQ ID NO:42). Comparison of the genomic sequence with that of hpa 
cDNA revealed 12 exons separated by 11 introns (Figures 15 an 16). The 
15 genomic organization of the hpa gene is depicted in Figure 15 (top). The 
sequence include the coding region from the first ATG to the stop codon 
which spans 39,113 nucleotides, 2742 nucleotides upstream of the first 
ATG and 3043 nucleotides downstream of the stop codon. Splice site 
consensus sequences were identified at exon/intron junctions. 
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EXAMPLE 11 



Alternative splicing 



Several minor RT-PCR products were obtained from various cell 
types, following amplification with hpa specific primers. Each one found to 
contain a deletion of one or two exons. Some of these PCR products 
contain ORFs, which encode potential shorter proteins. 

Table 1 below summarizes the alternative spliced products isolated 

from various cell lines. 

Fragments of similar sizes were obtained following amplification 

with two cell lines, placenta and platelets. 

Cell type Nucleotides deleted Exons deleted ORF 



Platelets 1047-1267 J, 9 

Platelets 1154-1267 9 

Platelets 289-435,562-735 2,4 

Sk-hepl, platelets, Zr75 562-735 4 

Sk-hepl (hepatoma) 561-904 4, 5 

Zr75 (breast carcinoma) 96-203 1 (partial) 



+ 



+ 
+ 



EXAMPLE 12 



Mouse and rat hpa 



EST databases were screened for sequences homologous to the hpa 
gene. Three mouse ESTs were identified (accession No. Aal 77901, from 
mouse spleen, Aa067997 from mouse skin, Aa47943 from mouse embryo), 
assembled into a 824 bp cDNA fragment which contains a partial open 
reading frame (lacking a 5' end) of 629 bp and a 3' untranslated region of 
195 bp (SEQ ID NO: 12). As shown in Figure 13, the coding region is 80 % 



102 



similar to the 3' end of the hpa cDNA sequence. These EST's are probably 
cDNA fragments of the mouse hpa homolog that encodes for the mouse 
heparanase. 

Searching for consensus protein domains revealed an amino terminal 
homology between the heparanase and several precursor proteins such as 
Procollagen Alpha 1 precursor, Tyrosine-protein kinase-RYK, Fibulin-1, 
Insulin-like growth factor binding protein and several others. The amino 
terminus is highly hydrophobic and contains a potential trans-membrane 
domain. The homology to known signal peptide sequences suggests that it 
could function as a signal peptide for protein localization. 

The amino acid sequence of human heparanase was used to search 
for homologous sequences in the DNA and protein databases. Several 
human EST's were identified, as well as mouse sequences highly 
homologous to human heparanase. The following mouse EST's were 
identified AA177901, AA674378, AA67997, AA047943, AA690179, 
All 22034, all sharing an identical sequence and correspond to amino acids 
336-543 of the human heparanase sequence. The entire mouse heparanase 
cDNA was cloned, based on the nucleotide sequence of the mouse EST's. 
PCR primers were designed and a Marathon RACE was performed using a 
Marathon cDNA library from 15 days mouse embryo (Clontech) and from 
BL6 mouse melanoma cell line. The mouse hpa homologous cDNA was 
isolated following several amplification steps. A 1.1 kb fragment was 
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amplified from mouse embryo Marathon cDNA library. The first cycle of 
amplification was performed with primers mh P 1773 and Apl and the second 
cycle with primers mhpl736 and AP2. A 1.1 kb fragment was then 
amplified from BL6 Marathon cDNA library. The first cycle of 
amplification was performed with the primers mhpll52 and Apl, and the 
second with mhpl83 and AP2. The combined sequence was homologous to 
nucleotides 157 - 1702 of the human hpa cDNA, which encode amino acids 
33-543. The 5' end of the mouse hpa gene was isolated from a mouse 
genomic DNA library using the Genome Walker kit (Clontech). An 0.9 kb 
fragment was amplified from a Dral digested Genome walker DNA library. 
The first cycle of amplification was performed with primers mhplll4 and 
Apl and the second with primers mhpll03 and AP2. The assembled 
sequence (SEQ ID NOs:43, 45) is 2396 nucleotides long. It contains an 
open reading frame of 1605 nucleotides, which encode a polypeptide of 535 
acids (SEQ ID NOs:44, 45), 196 nucleotides of 3' untranslated 
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region (UTR), and anupstream sequence which includes the promoter 
region and the 5'-UTR of the mouse hpa cDNA.. According to two 
promoter predicting programs TSSW and TSSG, the transcription start site 
is localized to nucleotide 431 of SEQ ID NOs:43, 45, 163 nucleotides 
upstream of the first ATG codon. The 431 upstream genomic sequence 
contains the promoter region. A TATA box is predicted at position 394 of 
SEQ ID NOs:43, 45. The mouse and the human hpa genes share an 
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average homology of 78 % between the nucleotide sequences and 81 % 
similarity between the deduced amino acid sequences. 

Search for hpa homologous sequences, using the Blast 2.0 server 
revealed two EST's from rat: AI060284 (385 nucleotides, SEQ ID NO:46) 
which is homologous to the amino terminus (68 % similarity to amino acids 

12-136) of human heparanase and AI237828 (541 nucleotides, SEQ ID 
NO:47) which is homologous to the carboxyl terminus (81 % similarity to 

amino acids 500-543) of human heparanase, and contains a 3'-UTR. A 

comparison between the human heparanase and the mouse and rat 

homologous sequences is demonstrated in Figure 17. 



EXAMPLE 13 
Prediction of heparanase active site 
Homology search of heparanase amino acid sequence against the 
DNA and the protein databases revealed no significant homologies. The 
protein secondary structure as predicted by the PHD program consists of 
alternating alpha helices and beta sheets. The fold recognition server of 
UCLA predicted alpha/beta barrel structure, with under-threshold 
confidence. 

Five of 15 proteins, which were predicted to have most similar folds, 
were glycosyl hydrolases from various organisms: lxyza - xylanase from 
Clostridium Thermocellum, lpbga - 6-phospho-beta-5-galactosidase from 
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Lactococcus Lactis, lamy - alpha-amylase from Barley, lecea - 
endocellulase from Acidothermus Cellulolyticus and lqbc - 
hexosaminidase alpha chain, glycosyl hydrolase. 

Protein homology search using the bioaccelerator pulled out several 
proteins, including glycosyl hydrolyses such as beta-fructofuranosidase 

4- 

from Vicia faba (broad bean) and from potato, lactase phlorizin hydrolase 
from human, xylanases from Clostridium thermocellum and from 
Streptomyces halstedii and cellulase from Clostridium thermocellum. 
Blocks 9.3 database pulled out the active site of glycosyl hydrolases family 
five, which includes cellulases from various bacteria and fungi. Similar 
active site motif is shared by several lysosomal acid hydrolases (63) and 
other glycosyl hydrolases. The common mechanism shared by these 
enzymes involves two glutamic acid residues, a proton donor and a 
nucleophile. 

Despite the lack of an overall homology between the heparanase and 
other glycosyl hydolases, the amino acid couple Asp-Glu (NE), which is 
characteristic of the proton donor of glycosyl hydrolyses of the GH-A clan, 
was found at positions 224-225 of the human heparanase protein sequence. 
As in other clan members, this NE couple is located at the end of a p sheet. 

Considering the relative location of the proton donor and the 
predicted secondary structure, the glutamic acid that functions as 
nucleophile is most likely located at position 343, or at positon 396. 
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Identification of the active site and the amino acids directly involved in 
hydrolysis opens the way for expression of the defined catalytic domain. In 
addition, it will provide the tools for rational design of enzyme activity 
either by modification of the microenviroment or catalytic site itself. 



EXAMPLE 14 



Expression ofhpa antisense in mammalian cell lines 
A mammalian expression vector Hpa2Kepcdna3 was constructed in 
order to express hpa antisense in mammalian cells, hpa cDNA (1.7 kb 

io Ecom fragment) was cloned into the plasmid pCDNA3 in 3'>5' (antisense) 
orientation. The construct was used to transfect MBT2-T50 and T24P cell 
lines. 2 x 10$ cells in 35 mm plates were transfected using the Fugene 
protocol (Boehringer Mannheim). 48 hours after transfection cells were 
trypsinized and seeded in six well plates. 24 hours later G418 was added to 

15 initiate selection. The number of colonies per 35 mm plate following 3 
weeks: 



Antisense No insert 
T24P 1 5 60 



20 MBT-T50 1 
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The lower number of colonies obtained after transfection with hpa 
antisense, as compared with the control plasmid suggests that the 
introduction of hpa antisense interfere with cell growth. This experiment 
demonstrates the use of complementary antisense hpa DNA sequence to 
control heparanase expression in cells. This approach may be used to 
inhibit expression of heparanase in vivo, in, for example, cancer cells and in 
other pathological processes in which heparanase is involved. 



EXAMPLE 15 
Zoo blot 

in 



Hpa cDNA was used as a probe to detect homologous sequences 
human DNA and in DNA of various animals. The autoradiogram of the 
Southern analysis is presented in Figure 18. Several bands were detected in 
human DNA, which correlated with the accepted pattern according to the 
genomic hpa sequence. Several intense bands were detected in all 
mammals, while faint bands were detected in chicken. This correlates with 
the phylogenetic relation between human and the tested animals. The 
intense bands indicate that hpa is conserved among mammals as well as in 
more genetically distant organisms. The multiple bands patterns suggest 
that in all animals, like in human, the hpa locus occupy large genomic 
region. Alternatively, the various bands could represent homologous 
sequences and suggest the existence of a gene family, which can be isolated 
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based on their homology to the human hpa reported herein. This 
conservation was actually found, between the isolated human hpa cDNA 
and the mouse homologue. 



EXAMPLE 16 



Characterization of the hpa promoter 



The DNA sequence upstream of the hpa first ATG was subjected to 
computational analysis in order to localize the predicted transcription start 
site and to identify potential transcription factors binding sites. Recognition 
10 of human PolII promoter region and start of transcription were predicted 
using the TSSW and TSSG programs. Both programs identified a promoter 
region upstream of the coding region. TSSW pointed at nucleotide 2644 
and TSSG at 2635 of SEQ ID NO:42. These two predicted transcription 
start sites are located 4 and 13 nucleotides upstream of the longest hpa 

1 5 cDNA isolated by RACE. 

A hpa promoter-GFP reporter vector was constructed in order to 
investigate the regulation of hpa transcription. Two constructs were made, 
containing 1.8 kb and 1.1 kb of the hpa promoter region. The reporter 
vector was transfected into T50-mouse bladder carcinoma cells. Cells 

20 transfected with both constructs exhibited green fluorescence, which 
indicated the promoter activity of the genomic sequence upstream of the 
/ipa-coding region. This reporter vector, enables the monitoring of hpa 
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promoter activity, at various conditions and in different cell types and to 
characterize the factors involved regulation of hpa expression. 

Although the invention has been described in conjunction with 
specific embodiments thereof, it is evident that many alternatives, 
modifications and variations will be apparent to those skilled in the art. 
Accordingly, it is intended to embrace all such alternatives, modifications 
and variations that fall within the spirit and broad scope of the appended 
claims. 
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TCACCAAGTA CTTGCGGTTA CCCTATCCTT TTTCTAACAA GCAAGTGGAT AAATACCTTC 1500 
tSgaccttt GGGACCTCAT GGATTACTTT CCAAATCTGT CCAACTCAAT GGTCTAACTC 15 o 
T AAAG AT GGT GGATGATCAA ACCTTGCCAC CTTTAATGGA AAAACCTCTC CGGCCAGGAA 620 
gScactggg CTTGCCAGCT TTCTCATATA gtttttttgt GATAAGAAAT GCCAAAGTTG 1680 
CTGCTTGCAT CTGAAAATAA AATATACTAG TCCTGACACT G 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 3 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
Met Leu Leu Arg Ser Lys Pro Ala Leu Pro Pro Pro Leu Met Leu Leu 

5 10 15 



Leu Leu Gly Pro Leu Gly Pro Leu Ser Pro Gly Ala Leu Pro Arg Pro 

20 



25 30 



Ala Gin Ala Gin Asp Val Val Asp Leu Asp Phe Phe Thr Gin Glu Pro 

40 45 



35 



Leu Hi 



is Leu Val Ser Pro Ser Phe Leu Ser Val Thr He Asp Ala Asn 



50 



55 



60 



Leu Ala Thr Asp Pro Arg Phe Leu He Leu Leu Gly Ser Pro Lys Leu 



70 75 
65 



130 

80 



Brg Thr Leu Ala Arg Gly -u Ser Pro Ala Tyr Leu Arg P h e Gly Gly 

85 90 

-v. «. « - - ■» *» ™ «■ w * si " ~. ™ 

100 105 

Glu Glu Arg Ser Tyr Trp Gin Ser Gin Val Asn Gin AsP He Cys Lys 

120 

115 J "^ u 

Tyr Gly Ser XI. Pro Pro Asp Val Glu Olu Lys - Arg Leu Glu Trp 

135 

130 1 " > ° 

Pro Tyr Gin Glu Gin Leu Leu Leu Arg Glu His Tyr Gin Lys L y* F*. 

150 

145 10 

Lys Asn Ser Thr Tyr Ser Ar g Se, Ser Val Asp V.1 Leu T y r Thr Phe 

165 170 

, n TP11 Tle Phe Gly Leu Asn Ala Leu Leu 
Ala Asn Cys Ser Gly Leu Asp Leu He Phe G y 

180 185 

t rm tid Asn Ser Ser Asn Ala Gin Leu Leu Leu 
Arg Thr Ala Asp Leu Gin Trp Asn ber 

700 

195 zuu 

^ „ Tlo c^r TrD Glu Leu Gly Asn 
A sp Tyr Cys Ser Ser Lys Gly Tyr Asn He Ser Trp 

210 215 

Glu Pro Asn Ser Phe Leu Lys Lys Ala Asp He Phe Xle Asn Gly Ser 

230 

. 1fl r1n Teu His Lys Leu Leu Arg Lys Ser 
r1 „ r in asd Tvr He Gin Leu ma ±*y 
Gin Leu Gly Glu a&p 255 

245 250 

Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro Asp V.1 Gly Gin Pro Arg 

260 265 
Arg Lys Thr Ala Lys Met Leu L y s Ser Phe Leu Lys Ala Gly Gly Glu 

m= Hi« Tvr Tvr Leu Asn Gly Arg Thr 
val He Asp Ser v.1 Thr Trp Hrs His T y r Tyr 

295 

290 " 

x *=„ cm Asd Val Leu Asp He Phe He 
Ala Thr Arg Glu Asp Phe Leu Asn Pro Asp vai ^ 

305 310 

330 

325 JJU 
340 345 

r1 „ phP Met Trp Leu Asp Lys 
Pro Leu Leu Ser Asp Thr Phe Ala Ala Gly Phe Met P 

355 360 

r-iv Tie Glu Val Val Met Arg Gin Val 
Leu Gly Leu Ser Ala Arg Met Gly He Glu 



370 375 



131 

380 



val Asp Glu Asn Phe Asp Pro 
395 



Phe Phe Gly Ala Gly Asn Tyr His Leu ^ ^ Q 

385 390 

«„ m „ «. T* ~ « - - « «• - ~ - S 

405 4iU 

, Met Ala ser Val Gin Gly Ser Lys Arg Arg Lys Leu Arg 

Lv s Val Leu Met Ala bei va 43Q 

4 25 

420 ^ 

t T.eu His Cys Thr Asn Thr Asp Asn Pro Arg Tyr Lys Glu Gly 

Val Tyr Leu His <-y^ iA 445 

435 440 

„ * ... ~ - - - »» *» ;s T " w * T " 

450 455 
Rrg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp Lys Tyr Leu Leu 

,,, 470 q '° 

465 

cot ivs Ser Val Gin Leu Asn 
Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys ^ 

i-m Thr Leu Pro Pro Leu Met 
Gly Leu Thr Leu Lys Met Val Asp Asp Gin Thr ^ 

500 505 

t „ Ara Pro Gly Ser Ser Leu Gly Leu Pro Ala Phe Ser 
Glu Lys Pro Leu Arg Pro biy o 

520 

515 0 

Tyr ser Phe Phe Val He Arg Asn Ala Lys Val Ala Ala Cys II. 

535 

530 D ^ 

(2 ) INFORMATION FOR SEQ ID NO: 11 : 

(i ) SEQUENCE CHARACTERISTICS: 

(A ) LENGTH: 1121 

{B ) TYPE: nucleic acid 

( C) STRANDEDNESS : double 

( D ) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQIDNO:ll= 

CT AGA GCT TTC GAC : 
TC , CCO CT 0 <*= «C «= ™ °» ™ ™ C " MT ^ CK ^ 

- ~ « r r.: s: e r.: s s r.: z z z 1 

Met Leu Leu Arg Ser Lys fro * l5 

5 X 

m rrr CTC TCC CCT GGC GCC CTG CCC CGA CCT 1 

Leu Leu Gly Pro Leu Gly Pro Leu Ser ^ 

20 2 

„ rnr T?r TTC ACC CAG GAG CCG 

OCG CAA GCA CAG GAC GTC GTG GAC CTG GAC TTC TTC ^ 

Ala Gin Ala Gin Asp Val Val Asp Leu Asp Phe 
35 40 



CTG CAC 



132 

Leu His Leu Val r 6Q 

50 55 

CTG GCC ACG GAC CCG CGG TTC CTC A ^ ^ ^ Lys Leu 

L eu Ala Thr Asp Pro Arg Phe Leu He Le 8Q 

70 

65 

ira _ TTG tCT CCT GCG TAG CTG AGG TTT GGT GGC 350 
CGT ACC TTG GCC AGA GGC TTG TCT ^ ^ ^ ^ Qly 

Rrg Thr Leu Ala Arg Gly Leu ser Pro Al^ ^ 

M M „ « 1 :r „ ™ « ~ * « - r« s r, 398 

Thr Lys Thr Asp Phe Leu He u(j 

,.,,/- ™t &TT TGC AAA 446 

« - r si s s ~ "i s z z z z z ~ - 

Glu Glu Arg Ser Tyr irp i25 
115 12 

nn. PAT GTG GAG GAG AAG TTA CGG TTG GAA TGG 494 

* z z z z z z - - - ~ - - - 

ccc Z «« - - TTG CTA CTC CGA GAA CAC £ - AAA AAG TTC 54 2 
145 

,„ TCT GTft GAT GTG CTA TAG ACT TTT 590 

™ ™ r r g - - - ^ ^ phe 

Lys Asn Ser Thr Tyr Ser Arg 175 

165 

mm rrr CTA AAT GCG TTA TTA 638 

GCA AAC TGC TCA GGA CTG GAC TTG ATC TTT GGC CTA « ^ ^ ^ 

Ala Asn cys Ser Gly Leu Asp Leu lie 
180 185 

*n ,rT TCT AAT GCT CAG TTG CTC CTG 686 
AGA ACA GCA GAT TTG CAG TGG AAC ACT TCT AAT ^ ^ ^ 

Rr g Thr Ala Asp Leu Gin Trp Asn Ser Ser ^ 

«r rrr TAT AAC ATT TCT TGG GAA CTA GGC AAT 734 
GAC TAC TGC TCT TCC AAG GGG TAT AAC A ^ Qy ^ 

Rsp Tyr Cys Ser Ser Lys Gly Tyr Asn He P 

215 

210 

^ rrT PAT ATT TTC ATC AAT GGG TCG 782 

z z z Z Z Z Z Z Z Z u. ~ - r , 

~ ™ - t si z z z z z z z z z z M0 

Gin Leu Gly Glu Asp Tyr lie * 255 

245 

n. rrT TCT GAT GTT GGT CAG CCT CGA 87 8 
prpp TAT GGT CC1 1 ^ 
R CC TTC AAA AAT GCA AAA CTC TAT ^ pro Mg 

Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro Asp ^ 
260 265 

E ,r AGC TTC CTG AAG GCT GGT GGA GAA 926 
AGA AAG ACG GCT AAG ATG CTG AAG AGC TTC ^ ^ ^ 

ftrg Lys Thr Ala Lys Met Leu Lys Ser Phe 



133 



275 



280 



285 



GTG ATT GAT TCA GTT ACA TGG CAT CAC TAG TAT TTG AAT GGA CGG ACT 974 
val He Asp Ser Val Thr Trp His His Tyr Tyr Leu Asn Gly Arg The 
290 295 300 

GCT ACC AGG GAA GAT TTT CTA AAC CCT GAT GTA TTG GAC ATT TTT ATT 1022 
Ala Thr Arg Glu Asp Phe Leu Asn Pro Asp Val Leu Asp He Phe lie 
305 310 315 320 

TCA TCT GTG CAA AAA GTT TTC CAG GTG GTT GAG AGC ACC AGG CCT GGC 1070 
Ser Ser Val Gin Lys Val Phe Gin Val Val Glu Ser Thr Arg Pro Gly 

325 330 335 

AAG AAG GTC TGG TTA GGA GAA ACA AGC TCT GCA TAT GGA GGC GGA GCG 1118 
Lys Lys Val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala 
340 345 350 

CCC TTG CTA TCC GAC ACC TTT GCA GCT GGC TTT ATG TGG CTG GAT AAA 1166 
Pro Leu Leu Ser Asp Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 
355 360 365 

TTG GGC CTG TCA GCC CGA ATG GGA ATA GAA GTG GTG ATG AGG CAA GTA 1214 
Leu Gly Leu Ser Ala Arg Met Gly He Glu Val Val Met Arg Gin Val 
370 375 380 

TTC TTT GGA GCA GGA AAC TAC CAT TTA GTG GAT GAA AAC TTC GAT CCT 1262 
Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe Asp Pro 
385 390 395 "00 

TTA CCT GAT TAT TGG CTA TCT CTT CTG TTC AAG AAA TTG GTG GGC ACC 1310 
Leu Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu Val Gly Thr 

405 410 415 

AAG GTG TTA ATG GCA AGC GTG CAA GGT TCA AAG AGA AGG AAG CTT CGA 1358 
Lys val Leu Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu Arg 
420 425 430 

GTA TAC CTT CAT TGC ACA AAC ACT GAC AAT CCA AGG TAT AAA GAA GGA 1406 
Val Tyr Leu His Cys Thr Asn Thr Asp Asn Pro Arg Tyr Lys Glu Gly 
435 440 445 

GAT TTA ACT CTG TAT GCC ATA AAC CTC CAT AAC GTC ACC AAG TAC TTG 1454 
Asp Leu Thr Leu Tyr Ala He Asn Leu His Asn Val Thr Lys Tyr Leu 
450 455 460 

CGG TTA CCC TAT CCT TTT TCT AAC AAG CAA GTG GAT AAA TAC CTT CTA 1502 
Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp Lys Tyr Leu Leu 
465 470 475 480 

AGA CCT TTG GGA CCT CAT GGA TTA CTT TCC AAA TCT GTC CAA CTC AAT 1550 
Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys Ser Val Gin Leu Asn 

485 490 495 

GGT CTA ACT CTA AAG ATG GTG GAT GAT CAA ACC TTG CCA CCT TTA ATG 1598 
Gly Leu Thr Leu Lys Met Val Asp Asp Gin Thr Leu Pro Pro Leu Met 
500 505 510 



134 



GAA AAA CCT CTC CGG CCA GGA ACT TCA CTG GGC TTG CCA GCT TTC TCA 164 6 
Glu Lys Pro Leu Arg Pro Gly Ser Set Leu Gly Leu Pro Ala Phe Ser 
51 5 520 525 

TAT ACT TTT TTT GTG ATA AGA AAT GCC AAA GTT GCT GCT TGC ATC TGA 1694 
Tyr Ser Phe Phe Val He Arg Asn Ala Lys Val Ala Ala Cys He 
530 535 ' 540 543 

1721 

AAA TAA AAT ATA CTA GTC CTG AC A CTG 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 824 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY:. linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 

CTGGCAAGAA GGTCTGGTTG GGAGAGACGA GCTCAGCTTA CGGTGGCGGT GCACCCTTGC 60 
TGTCCAACAC CTTTGCAGCT GGCTTTATGT GGCTGGATAA ATTGGGCCTG TCAGCCCAGA 120 
TGGGCATAGA AGTCGTGATG AGGCAGGTGT TCTTCGGAGC AGGCAACTAC CACTTAGTGG 180 
AT G AAAAC T T TGAGCCTTTA CCTGATTACT GGCTCTCTCT TCTGTTCAAG AAACTGGTAG 2 40 
GTCCCAGGGT GTTACTGTCA AGAGTGAAAG GCCCAGACAG GAGCAAACTC CGAGTGTATC 300 
TCCACTGCAC TAACGTCTAT CACCCACGAT ATCAGGAAGG AGATCTAACT CTGTATGTCC 360 
TGAACCTCCA TAATGTCACC AAGCACTTGA AGGTACCGCC TCCGTTGTTC AGGAAACCAG 420 
TGGATACGTA CCTTCTGAAG CCTTCGGGGC CGGATGGATT ACTTTCCAAA TCTGTCCAAC 480 
TGAACGGTCA AATTCTGAAG ATGGTGGATG AGCAGACCCT GCCAGCTTTG ACAGAAAAAC 54 0 
CTCTCCCCGC AGGAAGTGCA CTAAGCCTGC CTGCCTTTTC CTATGGTTTT TTTGTCATAA 600 
GAAATGCCAA AATCGCTGCT TGTATATGAA AATAAAAGGC ATACGGTACC CCTGAGACAA 660 
AAGCCGAGGG GGGTGTTATT CATAAAACAA AACCCTAGTT TAGGAGGCCA CCTCCTTGCC 720 
GAGTTCCAGA GCTTCGGGAG GGTGGGGTAC ACTTCAGTAT TACATTCAGT GTGGTGTTCT 780 
CTCTAAGAAG AAT AC T GC AG GTGGTGACAG TTAATAGCAC TGTG 824 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1899 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 



GGGAAAGCGA GCAAGGAAGT AGGAGAGAGC CGGGCAGGCG GGGCGGGGTT GGATTGGGAG 
CAGTGGGAGG GAT GC AGAAG AGGAGTGGGA GGGATGGAGG GCGCAGTGGG AGGGGTGAGG 
AGGCGTAACG GGGCGGAGGA AAGGAGAAAA GGGCGCTGGG GCTCGGCGGG AGGAAGTGCT 
AGAGCTCTCG ACTCTCCGCT GCGCGGCAGC TGGCGGGGGG AGCAGCCAGG TGAGCCCAAG 
ATGCTGCTGC GCTCGAAGCC TGCGCTGCCG CCGCCGCTGA TGCTGCTGCT CCTGGGGCCG 
CTGGGTCCCC TCTCCCCTGG CGCCCTGCCC CGACCTGCGC AAGCACAGGA CGTCGTGGAC 
CTGGACTTCT TCACCCAGGA GCCGCTGCAC CTGGTGAGCC CCTCGTTCCT GTCCGTCACC 
ATTGACGCCA ACCTGGCCAC GGACCCGCGG TTCCTCATCC TCCTGGGTTC TCCAAAGCTT 
CGTACCTTGG CCAGAGGCTT GTCTCCTGCG TACCTGAGGT TTGGTGGCAC CAAGACAGAC 
TTCCTAATTT TCGATCCCAA GAAGGAATCA ACCTTTGAAG AGAGAAGTTA CTGGCAATCT 
CAAGTCAACC AGGATATTTG CAAATATGGA TCCATCCCTC CTGATGTGGA GGAGAAGTTA 
CGGTTGGAAT GGCCCTACCA GGAGCAATTG CTACTCCGAG AACACTACCA GAAAAAGTTC 
AAGAACAGCA CCTACTCAAG AAGCTCTGTA GATGTGCTAT ACACTTTTGC AAACTGCTCA 
GGACTGGACT TGATCTTTGG CCTAAATGCG TTATTAAGAA CAGCAGATTT GCAGTGGAAC 
AGTTCTAATG CTCAGTTGCT CCTGGACTAC TGCTCTTCCA AGGGGTATAA CATTTCTTGG 



60 

120 

180 

240 

300 

3 60 

420 

480 

540 

600 

660 

720 

780 

840 

900 



1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1899 
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GAACTAGGCA ATGAACCTAA CAGTTTCCTT AAGAAGGCTG ATATTTTCAT CAATGGGTCG 960 
CAGTTAGGAG AAGATTATAT TCAATTGCAT AAACTTCTAA GAAAGTCCAC CTTCAAAAAT 1020 
GCAAAACTCT ATGGTCCTGA TGTTGGTCAG CCTCGAAGAA AGACGGCTAA GATGCTGAAG 1080 
AGCTTCCTGA AGGCTGGTGG AGAAGTGATT GATTCAGTTA CATGGCATCA CTACTATTTG 1140 
AATGGACGGA CTGCTACCAG GGAAGATTTT CTAAACCCTG ATGTATTGGA CATTTTTATT 1200 
TCATCTGTGC AAAAAGTTTT CCAGGTGGTT GAGAGCACCA GGCCTGGCAA GAAGGTCTGG 1260 
TTAGGAGAAA CAAGCTCTGC ATATGGAGGC GGAGCGCCCT TGCTATCCGA CACCTTTGCA 1320 
GCTGGCTTTA TGTGGCTGGA TAAATTGGGC CTGTCAGCCC GAATGGGAAT AGAAGTGGTG 138 0 
ATGAGGCAAG TATTCTTTGG AGCAGGAAAC T AC CAT T TAG TGGATGAAAA CTTCGATCCT 
TTACCTGATT ATTGGCTATC TCTTCTGTTC AAGAAATTGG TGGGCACCAA GGTGTTAATG 
GCAAGCGTGC AAGGTTCAAA GAGAAGGAAG CTTCGAGTAT ACCTTCATTG CACAAACACT 
GACAATCCAA GGTATAAAGA AGGAGATTTA ACTCTGTATG CCATAAACCT CCATAACGTC 
ACCAAGTACT TGCGGTTACC CTATCCTTTT TCTAACAAGC AAGTGGATAA ATACCTTCTA 
AGACCTTTGG GACCTCATGG ATTACTTTCC AAATCTGTCC AACTCAATGG TCTAACTCTA 
AAGATGGTGG ATGATCAAAC CTTGCCACCT TTAATGGAAA AACCTCTCCG GCCAGGAAGT 
TCACTGGGCT TGCCAGCTTT CTCATATAGT TTTTTTGTGA TAAGAAATGC CAAAGTTGCT 
GCTTGCATCT GAAAATAAAA TATACTAGTC CTGACACTG 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 592 

(B) TYPE : amino acid 

(C) STRANDEDNESS: singl 

(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 

Met Glu Gly Ala Val Gly Gly Val Arg Arg Arg Asn Gly Ala Glu 

5 10 15 

Glu Arg Arg Lys Gly Arg Trp Gly Ser Ala Gly Gly Ser Ala Arg 

20 25 3° 

Ala Leu Asp Ser Pro Leu Arg Gly Ser Trp Arg Gly Glu Gin Pro 

35 40 45 

Gly Glu Pro Lys Met Leu Leu Arg Ser Lys Pro Ala Leu Pro Pro 

50 55 60 

Pro Leu Met Leu Leu Leu Leu Gly Pro Leu Gly Pro Leu Ser Pro 

65 ™ 75 

Gly Ala Leu Pro Arg Pro Ala Gin Ala Gin Asp Val Val Asp Leu 

80 85 90 

Asp Phe Phe Thr Gin Glu Pro Leu His Leu Val Ser Pro Ser Phe 

95 100 105 

Leu Ser val Thr He Asp Ala Asn Leu Ala Thr Asp Pro Arg Phe 

110 H5 120 

Leu He Leu Leu Gly Ser Pro Lys Leu Arg Thr Leu Ala Arg Gly 

12 5 130 135 

Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly Thr Lys Thr Asp Phe 

140 145 150 

Leu He Phe Asp Pro Lys Lys Glu Ser Thr Phe Glu Glu Arg Ser 

155 160 I 65 

Tyr Trp Gin Ser Gin Val Asn Gin Asp He Cys Lys Tyr Gly Ser 

170 "5 180 

He Pro Pro Asp Val Glu Glu Lys Leu Arg Leu Glu Trp Pro Tyr 

185 190 i95 

Gin Glu Gin Leu Leu Leu Arg Glu His Tyr Gin Lys Lys Phe Lys 

200 205 210 

Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Val Leu Tyr Thr Phe 

215 220 225 

Ala Asn Cys Ser Gly Leu Asp Leu He Phe Gly Leu Asn Ala Leu 



136 

9 or 240 

230 235 



L eu Arg Thr Ala Asp Leu Gin Trp Asn Ser Ser Asn Ala Gin Leu 

245 250 
Leu Leu Asp Tyr Cys Ser Ser Lys Gly Tyr Asn He Ser Trp Glu 

260 265 
Leu Gly Asn Glu Pro Asn Ser Phe Leu Lys Lys Ala Asp He Phe 

275 280 
lie Asn Gly Ser Gin Leu Gly Glu Asp Tyr He Gin Leu His Lys 

290 2yb 
Leu Leu Arg Lys Ser Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro 

305 310 315 

fts p val Gly Gin Pro Arg Arg Lys Thr Ala Lys Met Leu Lys Ser 

325 ooyj 
Phe Leu Lys Ala Gly Gly Glu Val He Asp Ser Val Thr Trp His 

335 340 
His Tyr Tyr Leu Asn Gly Arg Thr Ala Thr Arg Glu Asp Phe Leu 

350 3 55 360 

Asn Pro Asp Val Leu Asp He Phe He Ser Ser Val Gin Lys Val 

365 370 3 

Phe Gin val Val Glu Ser Thr Arg Pro Gly Lys Lys Val Trp Leu 

380 385 Jyu 

Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro Leu Leu Ser 

395 400 
RSP Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys Leu Gly Leu 

415 

ser Ala Arg Met Gly He Glu Val Val Met Arg Gin Val Phe Phe 

425 430 
Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe Asp Pro Leu 

440 445 4 

Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu Val Gly Thr 

455 460 
L ys val Leu Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu 

475 4ou 

Arg Val Tyr Leu hIs Cys Thr Asn Thr Asp Asn Pro Arg Tyr Lys 

485 490 4yo 

Glu Gly Asp Leu Thr Leu Tyr Ala He Asn Leu His Asn Val Thr 

500 505 510 

Lys Tyr Leu Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp 

^5 520 
Lys Tyr Leu Leu Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys 



530 



535 



Ser Val Gin Leu Asn Gly Leu Thr Leu Lys Met Val Asp Asp Gin 

545 550 555 

Thr Leu Pro Pro Leu Met Glu Lys Pro Leu Arg Pro Gly Ser Ser 

560 565 570 

L eu Gly Leu Pro Ala Phe Ser Tyr Ser Phe Phe Val He Arg Asn 

575 580 585 

Ala Lys Val Ala Ala Cys He 

590 592 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1899 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 15 



137 



GGG 3 



AAA GCG AGC RAG GAA GTA GGA GAG AGC CGG GCA GGC GGG GCG GGG 48 

^ GAT TGG GAG CAG TGG GAG GGA TGC AGA AGA GGA GTG GGA GGG 93 

Zo GAG GGC GCA GTG GGA GGG GTG AGG AGG CGT AAC GGG GCG GAG 133 
Met Glu Gly Ala Val Gly Gly Val Arg Arg Arg Asn Gly Ala Glu 



5 10 



GAA AGG AGA AAA GGG CGC TGG GGC TCG GCG GGA GGA AGT GCT AGA 183 
G^ Arg Arg Lys Gly Arg Trp Gly Ser Ala Gly Gly Ser Ala Arg 

20 25 

GCT CTC GAC TCT CCG CTG CGC GGC AGC TGG CGG GGG GAG CAG CCA 228 
Al. Leu Asp Ser Pro Leu Arg Gly Ser Trp Arg Gly Glu Gin Pro 



35 40 



GGT GAG CCC AAG ATG CTG CTG CGC TCG AAG CCT GCG CTG CCG CCG 273 
Gly Glu Pro Lys Met Leu Leu Arg Ser Lys Pro Ala Leu Pro Pro 



cc 60 
50 bs 



CCG CTG ATG CTG CTG CTC CTG GGG CCG CTG GGT CCC CTc TCC CCT 318 
Pro Leu Met Leu Leu Leu Leu Gly Pro Leu Gly Pro Leu Ser Pro 



65 70 " 



GGC GCC CTG CCC CGA CCT GCG CAA GCA CAG GAC GTc GTG GAC CTG 363 
Gly Ala Leu Pro Arg Pro Ala Gin Ala Gin Asp Val Val Asp Leu 



A5 90 
80 as 



GAC TTc TTC ACC CAG GAG CCG CTG CAC CTG GTG AGC CCC TCG TTC 408 
Asp Phe Phe Thr Gin Glu Pro Leu His Leu Val Ser Pro Ser Phe 



95 



100 



CTG TCC GTC ACC ATT GAC GCC AAC CTG GCC ACG GAC CCG CGG TTC 453 
Leu ser val Thr He Asp Ala Asn Leu Ala Thr Asp Pro Arg Phe 



110 



CTC ATC CTC CTG GGT TCT CCA AAG CTT CGT ACC TTG GCC AGA GGC 498 
L eu He Leu Leu Gly Ser Pro Lys Leu Arg Thr Leu Ala Arg Gly 

125 130 

TTG TCT CCT GCG TAC CTG AGG TTT GGT GGC ACC AAG ACA GAC TTC 543 
Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly Thr Lys Thr Asp Phe 

140 "5 150 

CTA ATT TTC GAT CCC AAG AAG GAA TCA ACC TTT GAA GAG AGA AGT 588 
Leu lie Phe Asp Pro Lys Lys Glu Ser Thr Phe Glu Glu Arg Ser 

155 16° 165 

TAC TGG CAA TCT CAA GTC AAC CAG GAT ATT TGC AAA TAT GGA TCC 633 
Tyr Trp Gin Ser Gin Val Asn Gin Asp He Cys Lys Tyr Gly Ser 



170 "5 



ATC CCT CCT GAT GTG GAG GAG AAG TTA CGG TTG GAA TGG CCC TAC 678 
lie Pro Pro Asp Val Glu Glu Lys Leu Arg Leu Glu Trp Pro Tyr 

185 190 195 

CAG GAG CAA TTG CTA CTC CGA GAA CAC TAC CAG AAA AAG TTC AAG 723 



138 

Gin Glu Gin Leu Leu Leu Arg Glu His Tyr Gin Lys Lys Phe Lys 

200 205 

,,0 AGC ACC TAC TCA AGA AGO TCT GTA GAT GTG CTA TAC ACT TXT 7 68 
Asn Ser. Thr Tyr Ser Arg Ser Ser Val Asp val Leu Tyr Thr Phe 

215 220 225 

GCA AAC TGC TCA GGA CTG GAC TTG ATC TTT GGC CTA AAT GCG TTA 813 
Ala Asn Cys Ser Gly Leu Asp Leu He Phe Gly Leu Asn Ala Leu 

230 235 240 

TTA AGA ACA GCA GAT TTG CAG TGG AAC AGT TCT AAT GCT CAG TTG 858 
Leu Arg Thr Ala Asp Leu Gin Trp Asn Ser Ser Asn Ala Gin Leu 

245 250 

CTC CTG GAC TAC TGC TCT TCC AAG GGG TAT AAC ATT TCT TGG GAA 903 
L eu Leu Asp Tyr Cys Ser Ser Lys Gly Tyr Asn He Ser Trp Glu 

265 2/U 



260 



CTA GGC AAT GAA CCT AAC AGT TTC CTT AAG AAG GCT GAT ATT TTC 948 
Leu Gly Asn Glu Pro Asn Ser Phe Leu Lys Lys Ala Asp He Phe 

275 280 285 

ATC AAT GGG TCG CAG TTA GGA GAA GAT TAT ATT CAA TTG CAT AAA 993 
lie Asn Gly Ser Gin Leu Gly Glu Asp Tyr He Gin Leu His Lys 



CTT CTA AGA AAG TCC ACC TTC AAA AAT GCA AAA CTC TAT GGT CCT 1038 
Leu Leu Arg Lys Ser Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro 

305 310 315 

GAT GTT GGT CAG CCT CGA AGA AAG ACG GCT AAG ATG CTG AAG AGC 1083 
ft sp Val Gly Gin Pro Arg Arg Lys Thr Ala Lys Met Leu Lys Ser 

320 325 330 

TTC CTG AAG GCT GGT GGA GAA GTG ATT GAT TCA GTT ACA TGG CAT 1128 
Phe Leu Lys Ala Gly Gly Glu Val He Asp Ser Val Thr Trp His 

335 340 345 

CAC TAC TAT TTG AAT GGA CGG ACT GCT ACC AGG GAA GAT TTT CTA 1173 
His Tyr Tyr Leu Asn Gly Arg Thr Ala Thr Arg Glu Asp Phe Leu 

350 355 360 

ARC CCT GAT GTA TTG GAC ATT TTT ATT TCA TCT GTG CAA AAA GTT 1218 
Asn Pro Asp val Leu Asp He Phe He Ser Ser Val Gin Lys Val 



365 



370 



TTC CAG GTG GTT GAG AGC ACC AGG CCT GGC AAG AAG GTC TGG TTA 1263 
Phe Gin Val Val Glu Ser Thr Arg Pro Gly Lys Lys Val Trp Leu 

380 385 

GGA GAA ACA AGC TCT GCA TAT GGA GGC GGA GCG CCC TTG CTA TCC 1308 
Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro Leu Leu Ser 



395 



400 



GAC ACC TTT GCA GCT GGC TTT ATG TGG CTG GAT AAA TTG GGC CTG 1353 
ASP Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys Leu Gly Leu 
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410 



415 



420 



TCA GCC CGA ATG GGA ATA gAA GTG GTG ATG AGG CAA GTA TTC TTT 1398 
Ser Ala Arg Met Gly He Gl» V.1 v.l Met Arg Gin val Phe he 



425 



430 



GGA GCA GGA AAC TAC CAT TTA GTG GAT GAA AAC TTC GAT CCT TTA 1443 
Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe Asp Pro Leu 

440 445 

CCT GAT TAT TGG CTA TCT CTT CTG TTC AAG AAA TTG GTG GGC ACC 1488 
Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu Val Gly Thr 



455 



460 



MG GTG TTA ATG GCA AGC GTG CAA GGT TCA AAG AGA AGG AAG CTT 1533 
Lys val Leu Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu 



470 475 



CGA GTA TAC CTT CAT TGC ACA AAC ACT GAC AAT CCA AGG TAT AAA 1578 
Arg Val Tyr Leu His Cys Thr Asn Thr Asp Asn Pro Arg Tyr Lys 

485 490 495 

GAA GGA GAT TTA ACT CTG TAT GCC ATA AAC CTC CAT AAC GTC ACC 1623 
Glu Gly Asp Leu Thr Leu Tyr Ala lie Asn Leu His Asn Val Thr 



500 



505 



AAG TAC TTG CGG TTA CCC TAT CCT TTT TCT AAC AAG CAA GTG GAT 1668 
Lys Tyr Leu Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp 

ton Dc.D 



515 520 



^ TAC CTT CTA AGA CCT TTG GGA CCT CAT GGA TTA CTT TCC AAA 1713 
Lys Tyr Leu Leu Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys 



530 



535 



TCT GTC CAA CTC AAT GGT CTA ACT CTA AAG ATG GTG GAT GAT CAA 1758 
Ser Val Gin Leu Asn Gly Leu Thr Leu Lys Met Val Asp Asp Gin 

545 550 " 

ACC TTG CCA CCT TTA ATG GAA AAA CCT CTC CGG CCA GGA AGT TCA 1803 
Thr Leu Pro Pro Leu Met Glu Lys Pro Leu Arg Pro Gly Ser Ser 

560 "5 "0 

CTG GGC TTG CCA GCT TTC TCA TAT AGT TTT TTT GTG ATA AGA AAT 1848 
Leu Gly Leu Pro Ala Phe Ser. Tyr Ser Phe Phe Val He Arg Asn 

575 580 58 

GCC AAA GTT GCT GCT TGC ATC TGA AAA TAA AAT ATA CTA GTC CTG 1893 

Ala Lys Val Ala Ala Cys He 

590 592 
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ACA CTG 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 594 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(x i) SEQUENCE DESCRIPTION: SEQ ID NO: 16 
ATTACTATAG GGCACGCGTG GTCGACGGCC CGGGCTGGTA TTGTCTTAAT «™GTt« *° 

rrr TTT ag™ «^ 

TTTTTTCAGG CAAAAGTAAA ATACCTGAGA AACTGCCTGG CCAGAGGACA ATCAGATTT 
GGCTgScA AGTGACAAGC AAGTGTTTAT AAGCTAGATG GGAGAGGAAG GGATGAATAC 240 

™ = = = = = "° 

r— c— := z 

rrrGGCGCTT GGATCCCGGC CATCTCCGCA CCCTTCAAGT GGGTGTGGGT GATT1C 
GTGaTcG GA CCGCCACCGG GGGGAAAGCG AGCAAGGAAG TAGGAGAGAG CCGGG CAGG c 

ggggcggggt tggattggga gcagtgggag GGATGCAGAA gaggagtggg aggg 

(2 ) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

( X i) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
CCCCAGGAGC AGCAGCATCA G 21 

{2 ) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
AGGCTTCGAG CGCAGCAGCA T 21 

(2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 
GTAATACGAC TCACTATAGG GC 22 

(2 ) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 
ACTATAGGGC ACGCGTGGT 19 

( 2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
CTTGGGCTCA CCTGGCTGCT C 21 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22 
AGCTCTGTAG ATGTGCTATA CAC 23 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23 
GCATCTTAGC CGTCTTTCTT CG 22 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
GAGCAGCCAG GTGAGCCCAA GAT 23 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
TTCGATCCCA AGAAGGAATC AAC 23 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 6 
AGCTCTGTAG ATGTGCTATA CAC 23 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
TCAGATGCAA GCAGCAACTT TGGC 24 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH : 
TYPE: 

STRANDEDNESS: 
TOPOLOGY : 



(xi) SEQUENCE DESCRIPTION 
GCATCTTAGC CGTCTTTCTT CG 22 
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22 

nucleic acid 

single 

linear 

SEQ ID NO:28 



(2) 



INFORMATION FOR SEQ ID NO:29:- 
(i ) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: 

(C) STRANDEDNESS : 

(D) TOPOLOGY: 
(xi) SEQUENCE DESCRIPTION 

GTAGTGATGC CATGTAACTG AATC 24 



24 

nucleic acid 
single 
linear 

SEQ ID NO:29 



(2) 



INFORMATION FOR SEQ ID NO: 30: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B ) TYPE: 

(C) STRANDEDNESS: 

(D) TOPOLOGY: 
(Xi) SEQUENCE DESCRIPTION: 

AGGCACCCTA GAGATGTTCC AG 22 



22 

nucleic acid 
single 
linear 

SEQ ID NO: 30 



(2) 



INFORMATION FOR SEQ ID NO: 31: 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 

(B) TYPE: 

(C) STRANDEDNESS: 

(D) TOPOLOGY: 
(xi) SEQUENCE DESCRIPTION: 

GAAGATTTCT GTTTCCATGA CGTG 24 



24 

nucleic acid 
single 
linear 

SEQ ID NO:31 



(2) 



INFORMATION FOR SEQ ID NO: 32: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: 

(C) STRANDEDNESS: 

(D) TOPOLOGY: 
SEQUENCE DESCRIPTION: 



(xi) 

CCACACTGAA TGTAATACTG AAGTG 



25 

nucleic acid 
single 
linear 

SEQ ID NO: 32 



25 



(2 ) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS 



(A) LENGTH: 

(B) TYPE: 

(C) STRANDEDNESS: 

(D) TOPOLOGY: 
(xi) SEQUENCE DESCRIPTION 

CGAAGCTCTG GAACTCGGCA AG 22 



22 

nucleic acid 

single 

linear 

SEQ ID NO: 33 



(2) 



INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34 
GCCAGCTGCA AAGGTGTTGG AC 22 

(2 ) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 23 

(B) TYPE: " nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
AACACCTGCC TCATCACGAC TTC 23 

(2 ) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36 
GCCAGGCTGG CGTCGATGGT GA 22 

( 2) INFORMATION FOR SEQ ID NO: 37: 

( i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 37 
GTCGATGGTG ATGGACAGGA AC 22 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38 
GTAATACGAC TCACTATAGG GC 22 

(2 ) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39 
ACTATAGGGC ACGCGTGGT 19 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 
CCATCCTAAT ACGACTCACT ATAGGGC 27 
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50 
100 
150 
200 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : .* SEQ ID NO: 41 
ACTCACTATA GGGCTCGAGC GGC 23 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44848 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 

rraTCTTGGC TCACTGCAAT CTCTGCCTCC CATGCAATTC TTATGCATCA 
rrCTCCTGAG TAGCTTGGAT TATAGGTCTG CGCCACCACT CCTGGCTACA 
PCA?GT?GCC CAGGCTGGTC TTGAACTCTT GGGCTCTAGT GATCCACCCG 
rr T^GGCCTC CCAAAGTGCT GGGATTACAG GTGTGAGCCA TCACACCCGG 
CCCCCCGTTT CCATATTAGT AACTCACATG TAGACCACAA GGATGCACTA 250 
tttagaS TTGCAATGGT CCACTTTTCA AATCACCCAA acatgttaaa 300 
CAAATTGGTA TGACTGGGCA TGGCACAGTG GCTCATGCCT GCAATCCTAG 350 
rATTTTGTGA GGCTGAGACG GGCAGATCAC GAGGTCAGGA GATTGAGACC 400 
ATCCTGACAG ACATGGTGAA ATCCCATCTC TACTAAAAAT ACAAAACAAT 450 
TlrrrrGGGG T G AT GGC AGG CCCCTGTAGT CCCAGCTACT CGGGAGGCTG 500 

Iggcaggaga Itggcgtgaa tccaggaggc aoagcttgca gtgagccgag 55 

wrCTCCCAC TGCACTCCAG CCTGGGCGAC AGAGCGAGAC TCCG TCTCAA ou 

1= = iiii iiii = 

111 iiii = iiii ii| 

mItootm ^taagagag ggtctcactt tgtcacccag gctggagtgc 950 

AGTGGTGTGA TTAAGGGTCA CTGCAACCTC CACCTCCCAG GCTCAAATAA 
ACCTCCCACC TCAGCCTCCC CAGTAGCTGG ^CACAGGC ^GGGCCACC 
nr-rrr-rrrm AATTTTTTGT ATTTTTTGTA GAGATGGGGT 1 ILAH-fti^i 
TrrrCAGGCT OTCTTGAAT tcctcggctc aagcaatcct cccaccttgg 
cctccS ?gctggcatc acaggcatga tggcatcact ggcatcacat 

^raTrrr^ GCCTGATTTA TGCAAATTAG ATATGCATTT CAAAATAATC 

Elli Usss sssse sees sees 

skEs sssss KSSE ssss sees ;«. 
Eil ssss ssssss sssss ssss 
Iiii sss sssss sssss sssss isss 
ii Isss sssss sssss sees 

cacIgtatgc acttggcagg gttgtgagaa gggaagagaa cacaagtaaa 

rrACCTGTAT CAGGCATACA GTAGGCACTA AGCGTGCGAT GCTTGCTATG 
attatIcatc AGTGTAAGCA TCAAGGAAAA GCTGAAGAAA agtctgacca 
ACAGCGAAAG ATAAATGCGC AGAGGAGAAA TTTGGCAAAG GCTCCAAATT 1950 
npcTACTCTA CACTTTGTAT GGGGGCTTCA GGTCCTGAGT 2000 

™ iiii = £™c sssss i 
™ esse = is i 

ranrTrTTTA TAAGCTAGAT GGGAGAGGAA GGGATGAATA ^ itLAU ^ 

™ gaIggtcaga gggatacccg gcgccatcag ^tgggatct 

gggIgtcgga aacgctgggt tcccacgaga gcgcgcagaa cacgtgcgtc 

orrurrrTf, GTCCGGGATG CCCAGCGCTG CTCCCCGGGC GCTCCTCCUt. 
CCGCGCTCCT CCCCAGGCCT CCCGGGCGCT TGGATCCCGG CCATCTCCGC 
ACCCTTCAAG TGGGTGTGGG TGATTTCGTA AGTGAACGTG ACCGCCRCCG 
Sc GAGCAAGGAA GTAGGAGAGA GCCGGGCAGG CGGGGCGGGG 
TTrrATTGGG AGCAGTGGGA GGGATGCAGA AGAGGAGTGG GAGGGATGGA 

gggcgcIgtg ggaggggtga ggaggcgtaa cggggcggag gaaaggagaa 

T^rrrrrrTC GGGCTCGGCG GGAGGAAGTG CTAGAGCTCT CGALititub 

c^gcg^ggca gctcgcgggg ggagcagcca ggtgagccca agatgctgct 



1000 

1050 

1100 

1150 

1200 

1250 

1300 

1350 

1400 



1700 
1750 
1800 
1850 
1900 



2300 
2350 
2400 
2450 
2500 
2550 
2600 
2650 
2700 
2750 



GCGCTCGAAG 
CGCTGGGTCC 
GACGTCGTGG 
CCCCTCGTTC 
GGTTCCTCAT 
TCCTGTCCTC 
TGCGCGGAAA 
CCACTTCCCG 
AACCGCTTTT 
TGCAGTTCTG 
AGTGTAACTG 
CATCCCTCAT 
ACTATTCCAC 
GAGCCTTTTT 
GGTCAGGTAT 
ACTTCCCTAG 
TGAGTGCTTG 
CTTCTGTATT 
TTTTCTTTTT 
CAGAACAGGA 
ACTGCAGTGA 
GCTGCAAAAT 
AGAAAGAAGG 
GCTGACTCCA 
TCTTATTTAG 
CAGATAGAGG 
ATGCTGTCCT 
GTAGTGTACT 
TCTGGCAAAG 
CCTTGATTTC 
TTATTGCCCG 
CATAGTAAGT 
TGAGCATTTG 
AACCTCTCCT 
TTACTTTCTG 
CTAGTTGGGG 
TCCAGATCAC 
GGGCGGTTAC 
GAGGACCTGA 
TCCCCAAGAA 
AGAAAAAACA 
GTCAAGCTCC 
ATCTGCAATT 
TCATTTCCCA 
GACTGGAGTA 
CTTTTTTTCT 
TCCAAGGCTG 
CTGCCATACC 
CTGCTAAGAC 
GCTTCTAAAT 
CCAAACAGCC 
ACAGTAGAGG 
AAAGTGAAGA 
AAGGAATAAG 
CTTGTCATTT 
TTAGTCTTGG 
AAAAATGAAG 
TGAAACTAGA 
GCAGATTATA 
TGTTTGGAAA 
GAGAGTAACA 
GCCTAAGCAA 
TACATTGTGT 
GCAATCTAAA 
GGGACTTAAT 
CAGAACTCCA 
ATAATATAAT 
CTAGTAGCCA 
TAATTTTATT 
AT AT AAAAAT 
TATTCTATAA 
CTTTCTTTCT 
AGAGTACAAT 
CAAGTTATTC 
CACCACCACG 
CATGTTGGCC 
CTCGGCCTCC 
CTCAGATTAA 
ATGGTAGTGG 
CATAGTTCAC 



CCTGCGCTGC 
CCTCTCCCCT 
ACCTGGACTT 
CTGTCCGTCA 
CCTCCTGGGG 
CTGACACCTA 
CAACTTCACA 
CCTCTCATTC 
TGGGGGGTAT 
TTCCATGGGT 
TCACCCGAAT 
CCCTCTCCCA 
TAAGTCCATG 
GTTTCATTCA 
AAGTGGAAAT 
CCAAGAGGAG 
ATTTGCTGTG 
TAAGACTGTG 
TCTTTTCTTT 
ATAAAAATCT 
CTTACAAAGT 
GTGGTGCTGC 
AACCTGATGC 
AGATGGGGAG 
GCCCTGGGAG 
GAAAGATCAC 
CCCTGAGCCC 
GGTTAAGAGA 
ATGAGTGACT 
TTCAGCTGTA 
AGGATTTGAT 
GTTTGACGTT 
GTAGCCATTC 
TTTCCTTCTG 
CCCTTCCCCT 
GATAGGAAGA 
AGGACCCCAG 
TGAACATGGG 
GGCCTCTTCT 
AGGACAGATG 
TACAGTGGGA 
CCGCAATTGA 
TAGTGAGGAT 
GAGCACCCTT 
TAATGAAATA 
GGTTTACAAA 
GCTGCTGTCT 
TGCTCCCCAG 
CTAAAAGAGG 
TAGAGTGTGA 
CAGCCATCTC 
AGAGTAAGAC 
TGTGTGGGCG 
GAAAGAGAGG 
GCCTGGCCCT 
CTACAACACT 
AAAGTGAGGC 
CTCTGATCAC 
TTTTTTAAAG 
GGAATTGAGA 
GTATCAGGGC 
ACTTAGTCAC 
ACAGCCCAAC 
AT T T G AAAG A 
CAAGAAACCA 
AAGAGAGCCC 
GTGGGCCTCA 
TAT T AAAAAG 
TAGTTCAATA 
ATTAATGAGG 
TCTGGCGTGT 
TTTTTTTTTC 
GGCGTTACCT 
TCCTGCCTCA 
CCTGGCTAAT 
AGGCTAATCT 
CAAAGTGTTG 
CTATATTTGA 
ACAGTACAGA 
TAATGCACGG 



CGCCGCCGCT 
GGCGCCCTGC 
CTTCACCCAG 
CCATTGACGC 
TAAGCGCCAG 
TGTCTGCCCC 
CCGGAACCTC 
TCCCTCTCCC 
CATTTAAAAA 
AT ATT GC ATT 
GTTGTACATT 
CCCTCCCACC 
TGTACACATT 
TTCTGTAAGT 
TTGAAAAAGA 
GGAAACCAGG 
CAGTGTAGGA 
TTAGGAAGAT 
TTTTTTTTTA 
AAATATTCAA 
CCTAATAAAA 
CTTATCAGCT 
AGGTTCAGGG 
CTACAGGGAC 
CCTCCAGAGA 
CATTATCTCA 
ACACTATAGT 
TGGACAGACC 
TGGTTTTTCC 
GAATGGAATT 
GATATGGTAA 
TCAAACGAAT 
ACCGGTTTTC 
GCACTACAAT 
TAAGGATAGG 
TTGTTCCAGA 
TCTTAGCTTG 
TATGAAGTAG 
ATTGCTGTAG 
GGGGTTCCCC 
CTTCCAGGAT 
CTAACACCCC 
GATACCTTTA 
TTTTCCCCTC 
CCAAGAGAGC 
ACAGTTCATT 
GTTCCAGCCC 
ACGCATCGAC 
GAGGAACCCC 
GGGTCCATCT 
AGAAGACAAC 
GCCTGTTGTT 
GGATGGCAAG 
AGGACACAAA 
GTTCTAAGCA 
GTGAGTAACT 
TCAGGGAGGT 
ATGAGATAAT 
GAAAGAGAGT 
GAT T GAT ATA 
CCAAACCTTC 
TGGCGTGGTT 
TCCACACAAA 
AAAAATCTTT 
ATCAAAATAC 
AGCCCTAAAT 
TATGCAAGGT 
GTAAAAAGAA 
GATCCAAAAT 
TATTTATTAT 
ATTATTTACA 
CGAGACAATT 
CGGCTCACTG 
GTCTCCCAAG 
TGTGTATTTT 
CAAACTCCTG 
GGATTACAGG 
AGCGTTCAGT 
TCTGCATTTC 
TAAAAAAAAG 
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GATGCTGCTG CTCCTGGGGC 
CCCGACCTGC GCAAGCACAG 
GAGCCGCTGC ACCTGGTGAG 
CAACCTGGCC ACGGACCCGC 
CCTCCTGGTC CTGTCCCCTT 
GCCAGCGGCT CTCCTTCTTT 
CCCGCCTGTC TCTCCCCACC 
TCCCTTACTC TCAGACCCCA 
ATAGATTTAG GGGTTACAAG 
GTGGTGGCAT CTGGGCTCTT 
GTATCTAATA GGTAATTTCT 
TTTTGGAGTC TCCAGTGTCT 
GTTTAGCGCC CACTCTAAAT 
GTTGAATAGG CACCACCTAA 
AACTGCCCAC TTGCCCCAGT 
CAGGTGCACC TGAAGGCCTG 
CAAGTAAGAT TGTGCATAGC 
TTCTCTTTCT TTTCTTTTCT 
GGCAGATGAA AAGGGCGTCA 
TAAATGAGAC CTAGGAGACT 
AGATGTCTCT CCAAAATGGG 
CTAAGTTTTT TCCTTACCTG 
CTCCTGCCCC ATGAATGCAG 
AATCCCAGGT CTTCTAGGCC 
TGGCCACATC TTGACCAGCC 
CCTCTGTGTC AAATACCTAG 
TGCCAGCGCT AATTTAATGG 
ATCCTGGCTT GACTCTCAGC 
ATATCTCTTG GCCACACCAA 
TCTCAAGCTT GCCTCAAGGA 
GAGCTTCTCA GTGTTTGACC 
TGTTTCTTTC TAGGACATGG 
TGTTTCTTTG GATCATAGTT 
TTTCTGGTGG GGAAGAATCC 
AAGC T GAT AC TAGGCAGCAA 
GAAATGCTGA ACCATAGGGC 
CTGGGGTGTG GGGTGGGGGG 
ATGTCCATTT ACTGAAATGT 
CCAGCATATT CCCCAACCTC 
CCTGGAGTAA CAGGTCCAAA 
CTGGGCCTGA TCACCCAGCA 
CCTAACACGT AGAAATTCCA 
TTCTTCTTAA ATACATCTCT 
CTCTGCACCT TTTTGTTAAA 
ATAACATGTG ATACATAAAA 
CTTGTCCATA CGTGCTTCTC 
GCTTCGCTTG GAGAGGCCAT 
AAGCACACCC AGAGTGTTAT 
CTCTCCTCAT CTAAGACCTA 
CCCCAGGAGG GGCACAGGGC 
ACTAAGCTTT GTAGGGGTCC 
TAATTTATTA CAGTTCCTCA 
AGCTGAGCAG ACGAAAGCTG 
CAGCTGACAC TTCCTCAGTT 
CCTTCTAGGT ATTAATCCAT 
AGTTTTGTCA CCCCCATTTT 
TAAGTAACTT GGCCACAGTT 
AGTGCCCATA AAAAGGGAAA 
AGGATATGGT AGAAAAAGAT 
ATGAAAAGAA GCATTCACAT 
ATCTAAGGTA CTTCAAAGAG 
CTAGTCTCCA TGATGGCAAA 
ACTTAAATAC CAATGATAGA 
CAATTTGTCG TCTTCCCAGA 
TTCCTAAGCC TAACTGTGTG 
CAACACTGTC CAATGGAAAT 
CATATGTAAT TTTAAATTTT 
ACAAGTGAAA TTAATTTTAA 
GTTTTCTCAG CATGTAATCA 
TCCTTTTCTC AAACCAAGTC 
GCACTTCTCA GACTATATTT 
TTGCTCTTGT CACCCAAGCT 
CAACCTCCGC CTCCCGGGTT 
TAGCTGGGAC TAGAGGCATG 
TAGTAGAGAC AGGGTTTCAC 
AGCTCAGGTG ATATGCCCAC 
CGTGAGCCAC TGCACCCGGC 
AGCCACATGT AGCTAGTGCT 
AATTAAGACA CGTATACAAG 
TATAGTGCTG AGTCGGTGGT 



2800 
2850 
2900 
2950 
3000 
3050 
3100 
3150 
3200 
3250 
3300 
3350 
3400 
3450 
3500 
3550 
3600 
3650 
3700 
3750 
3800 
3850 
3900 
3950 
4000 
4050 
4100 
4150 
4200 
4250 
4300 
4350 
4400 
4450 
4500 
4550 
4600 
4650 
4700 
4750 
4800 
4850 
4900 
4950 
5000 
5050 
5100 
5150 
5200 
5250 
5300 
5350 
5400 
5450 
5500 
5550 
5600 
5650 
5700 
5750 
5800 
5850 
5900 
5950 
6000 
6050 
6100 
6150 
6200 
6250 
6300 
6350 
6400 
6450 
6500 
6550 
6600 
6650 
6700 
6750 
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AGAAATCCTA 
GATAATGCAA 
GCAAAGTTCA 
TTTCAATATT 
TAGAAAATTA 
AAAGAGGTGC 
CCAAAGGGAA 
CTCCTGGGAA 
ACTAACCCTG 
GTCCTCCTCA 
GCTATTCTCT 
GCATATTCTC 
GGATGACAGC 
TTTGACTTGT 
TTGTAAAATG 
TTGAGTGAAA 
CTGATGTGCA 
CACATCTGGC 
TTACTTACTC 
CTCTCCACTT 
GATAAACTGT 
TACACTCAAG 
ATTTCATCTC 
AGCCATGGTG 
AGCACCTTTT 
CCATCATAAC 
TACTTCTTCC 
CTTGTAATAA 
ACTAAACCTG 
GTGGCCCAAA 
ATATTAATAA 
GTATAGCTGG 
CAGCCACACA 
CCTGGCCCTT 
CTCTACTTTT 
ACATGACCAT 
CCTTTTCCTC 
TTTCCTCCTT 
CCCGTCCCCT 
CCTCTCCTGT 
CTTTCCCCAA 
AGAAACCACC 
TGCCCTCATG 
ATCATCAATG 
TTTGGTCTTT 
AGTCCTAACC 
ATTGCAGACG 
GCTTATCTAA 
G AC AC GC AC A 
AGTCAAAAAG 
CTGCGCCTAG 
TCGGCTTTTC 
AGTTTGCAGT 
GGAGGCATTT 
TCTGTCGCCT 
CTCTCTCTCT 
AGGCCTTCTT 
AGAAATTACT 
ACAACTGGCT 
TTCTGTGGTG 
TTAACAACTG 
GGCCAACAAC 
G AG AG AAGT A 
GCCCCCTTTT 
TTCTTTTATG 
ATTTTTCCTT 
AGTTTTAGTT 
CAGAGGGCGT 
AGGAGACAAA 
TTGTACTTGG 
TTTTGTTCAG 
GGGCCTTGCT 
CCAAGTACTT 
CTGCTCTCAG 
CCAAGGAAAT 
AGAAAATGAA 
TTTGTACCTC 
TCATGTGATG 
CAGTCTCTTG 
AATGAGGTTC 



AATACTGCAG 
CCATGCTTGC 
TCCATTTTTG 
AGATTCTTGC 
CTTATCAATG 
AGACTCCCCA 
ACAAAGGGCT 
GTGCTGTCCC 
TCCACTGTGC 
CAGAATATCT 
GATGACACTG 
CCATAGTCCA 
CCACTAGTTT 
TACCTTGGGC 
ACGATAATAA 
GAAGGCGGGT 
TTACGGGTGA 
TCTCATCCAG 
CCCCTTATTA 
CCTAGTCTCA 
CTCAGTTTCT 
TTGTAACAGA 
AACTCTGTAT 
AGAATATTTA 
TTTCTGAGAG 
AATTTTTAAA 
ATATCTGATT 
TAACCCCAAA 
GTTTAGTCCA 
AACCTGGAAA 
GCCATTTTAA 
GCTATTGAGC 
GACTGATGTT 
AGTGTTACCC 
TAAAAATCTC 
ACTTCTGCTT 
CTGTCATCAA 
TCTCTGATCT 
CCCCAACCCC 
ATCTTCAACT 
GCCTTTCCCA 
CCGTTTCTTC 
ATGGCACCAT 
GCCTTCCTTT 
GTTATGGGTT 
CCCAGTACCT 
TTATTAGTTA 
TATGACTGAT 
TAGGGAGAAT 
CTATGGGAAC 
AGAGGGAGTA 
AAAACTGTAA 
ACTCTGCGAC 
GGCAAGGTTG 
TTCTTGTTGG 
AGTTTGTCTT 
TTCACTTCAC 
TAAATTACTG 
CTCTGGGAAG 
TAAATACTCC 
GCTCACAAAT 
GTGGTACAGC 
ACTTATTTTT 
TTTCCTTAAC 
CATTCATCTG 
TATTTTTCTT 
TTATGGCATG 
CAATATTATT 
CAAAAAGGTC 
CAGTGTCCAA 
GAAAGTCTTT 
CATTCATTCA 
ATCTAGGTAT 
GGAGCTTGCA 
GAAAAAGGAA 
GGCAGCGCAG 
TAAGGACCAG 
CCACAGCAAC 
CCCAGCCGCG 
CTCTATCATC 



AGCAAAAGTG 
TTTTCATTGC 
CCAATTCAAT 
ATCTTCATAG 
TTAAACACAC 
TGTGCCTATT 
GGGGACAATC 
TCTGATTGAG 
CCTGGAGCCC 
CCTCTACCTC 
TCTTCCCTGT 
GTTCTTTTCC 
GAACTCCATA 
AAATTACCTC 
TGCCATTTGC 
AGCTTCCCTA 
TGCCATGACT 
TGCTCCTGCT 
ACTGAAGACT 
CCATCATCCT 
TCACTCACAT 
ACCAGCTTAT 
TCAGTGACAT 
CCATGGAAAT 
CCAGACCATA 
TACCTCCACT 
TGAGCTTCTT 
TCCCTGTTCC 
ACCATATTTT 
TGGAAAAATA 
TGCTTCATTT 
TCTTGCGGGA 
GCACCAAACA 
TTAACTCTCC 
TGACTCCACC 
CCCAAAGAAA 
ATCTGCAGAC 
CAGTCTGCTT 
CAAGGACTTC 
CCTCCCATTT 
TCTCAATTAC 
CCTCCCCTCG 
CATTGTGTCA 
GTTGGGAAAC 
GAATGAGGTT 
CAGAATGTGA 
GGATGAGGTC 
GTCCTTATAA 
ACCATGTGAT 
TTAGGAGAAA 
TGGCCCTGCC 
GACAATACAT 
TGCAGCCCTA 
ACAATGGAAG 
GGGGTGTTTT 
AAACATTGGT 
ATATTCCCCT 
CTCATGCAGT 
AGGGGAGACT 
CTCCATGGCC 
TTTCTCCAAA 
CAACTCCAGC 
GT AC AAAAGG 
AAACTGCTCT 
TTATTTCCAT 
GTATATGGTT 
TTTTGCACCC 
TTCTCAAAAC 
CTTAATACCA 
GTGGTAAACC 
GTCTGGCAGC 
CTTAAGTATT 
CGGGTAGATT 
GCAGAGATGG 
GTTGATTTCA 
TGTGATGGAG 
ACTGTGACCC 
TTTTCCAGGT 
ACTGCTTACA 
AAACCCAATC 



GTACGAACAG 
AATTTGCTTA 
AAATATTTAC 
ACAGAGTTGC 
GTTTTGATAA 
GATGGCAGAA 
ACACACCTCA 
CTCTTATTAT 
TTTGCAGGGT 
CTTGTCCAAG 
AGCCCTTTTG 
TGTTCTCCAG 
CTGCTATAGT 
CTTTTGTTCA 
TTCAGTGGGT 
CACGCTCAGT 
CAGTGTGTTT 
TACGGCACTC 
GGCACTGATC 
AGATGACTTC 
TTTTTTATAA 
CCAGCTCATG 
CCTGTGGGTA 
TGGCAAATAC 
GCTCTTCTAC 
GAACAGCTTC 
AAT T TAT CAT 
ATTGTTCTTC 
CTCTCTTTGG 
TTACTTATTA 
CCAGTCTCAG 
GGAGGGAGTG 
TTTTTTAGCT 
ATTTCTCTGC 
TTCACCTTAT 
ATGAGCAATT 
ATGTCATGCC 
CTTCCATTTC 
GCTCTATCAG 
TACTGGCTTC 
CTCCTCGCAC 
GCAGCCTGTT 
CTAAAATCAA 
CTAATAAACA 
ACCCCGAAAT 
CTTTATTTGG 
ATACTGGAAT 
CAAGGAGAAA 
GACAGGAGTT 
GACCTGGAAC 
ACTACCTTGA 
TTCTGTTGTT 
ACAAACTAAT 
CACTTTCTTA 
CTAACAATTC 
GTTCTTCAGA 
GGGTGGTCTC 
ACTGTGCTGG 
GGTTGATGGT 
AATTCCAAAC 
TTTAACATTT 
ACACCTCTGC 
TAAAATAAAA 
AGAAATAGAA 
GTCACTGTGG 
GAAATACTGT 
ATATTAAATC 
AAG AAAAT AT 
AAACTTTGAA 
CAAACAGTAT 
GACTTACCCT 
TATTAAACAC 
CT GAT AAGT C 
GGGCTGCAAT 
GAGAGTGATG 
AGTGACCCAA 
AGGTCACTCA 
GCTCGTTTCC 
AATACAGCTA 
AAAATGCCAA 



CAATCTCAGT 
TTTTCCTTCA 
T G AT AAAAAC 
TTTTCACATT 
CCAGTGTTGG 
ATATTCACAG 
TGTCTCCTAA 
TGCCTTCCCC 
TACCTGCTCT 
CTACAACTTG 
AGTAATGGCT 
TCTGGCTTCT 
TCAAGTCCCT 
GGTTCCTTGT 
TATTTTGAAA 
GTAGACTAGC 
TCCTCATCTC 
TGTCCCCCTC 
TCACAGTTTC 
AAGTCACCTA 
CAGATAATGT 
AAATGTATGC 
TCTGGAAATC 
TAAAAAGCAG 
TCCATAGCAC 
TTCCTCTCTC 
GTGAACCACT 
CTGCTAAAAT 
AATCTACAGG 
ATTTTAATGT 
TGGCCACCCT 
GACAGTCTCC 
TCCAGACTTC 
CTTTCACATT 
CATTCTTAGC 
ACTTCCTTTT 
TAAGTCCAGC 
TGCCCTGAAT 
TCACCTCTTC 
TTCCTCAAGC 
ATGCCTCTGC 
CTTCCTGTTC 
TCTCTCCGAC 
CTTTATCTTA 
C CAT AT TAG A 
GAATAGGGTC 
GTGATGGGCT 
TTTGGAGACA 
ATGGAGTTGG 
AAATCCTTTC 
ATTCAACGTT 
CAAACCAATT 
ACAGTCTCTT 
CCCCTTTAGG 
CTCTCCATCT 
CTTCTGACCT 
ACCCACTTCC 
AAACTGTTTA 
TTTTGCTGAT 
TGCCAACAGT 
GGCTTTCACA 
TTTTGTGTCA 
ACACCTGCAG 
TAGCTGAAGC 
TGGTGGGATT 
ACCTTTGATC 
TAGTTTTTGT 
TTCATTGCAA 
ATGTGATTTC 
TGGGTTTTCA 
TACATCAGGC 
CAGCGGTGTG 
AGTCAGGTCC 
AGAGAGTAAG 
AATGCTATGA 
GGTGGTACAG 
CAGATGCCCG 
TCCCACTTCC 
GAGGAATCTA 
GGAACAGAAT 
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- S TCCTCATCTT 

-SI I sssss sssss sssss 

*™ 1 ££S 

™^ i i is =c 

TnrrrcAGGT AcSg££ ACGCCTATAA TCCCAGCACT TTGGGAGGCT 
GATCATTTGA GGCCAGGAGT TCAAGACCAG CCTGGCCAAC 
GAGGCAGGTG GATCAlli^ taaaAAAAGT TATTTTAAAA ACTCAAATCT 

iii S ssss sssss 
illiE Esss sssss S£s ii | 

TGGCTTACGG GGCCCTCCGT GATGTGGCCC ™TTTGCTTC ^CMTCTGT 11600 

fcTCTCCCAG CCTCTCTGOC COC^ TGGACAGACT i"«50 

GCTC ^^I rrTGGAATGC TTTCTTCAAT CCTACCCCAC TCTCTTTAAT 11700 
TT I^IIrr TTTATTCTTT TTGAATGTCT AGCAGTGAAA CCATTTCCCC 

SESS SSSSS SEES 
S SSSS =1 = S| S| 

*™rarrr«rr CAACAAATAT TTGTTGAATA AATTAACAGA TGGCTT1 Ai ^ 

ism 

liHSIEisiii 
111 ii iii Is ii 

li=iiiliiiiiii 

TGAAACTCTG TACCCATTAA ACAATAGTGC AT ^i^^i. trrTrrTTCT 
TACAATTTAT TTTTATTTGG GTTTGTACCA AACTGAAAAT AG ,CTGCTTCT 

iiii ilii in §1 Hi 

SEE = i S sssss 
i ii ii= = ssss ii |i 

SSfcT| C» S CC S JC^CSCC JTCg^CC JJCC^ 

=s «1 siii ssss sssx ™. 

AAACCTACAT GTAGACAAAC TAATTAGGCC ^TCCAAGA^ ii 1355Q 

SSS5S ES32S S5S2 SSg g 
ESSE 3S5S SSSSK S s iii 

S™S isss s%» 

SSfc? SISttCJ = ™ ^SSSS SIS 

TTTTCCTTGG TGGGGAATGG TGAAGGGAGG CAGGAGTTAA ^ 13950 
AGAGATCCTA AGTCATTTAT AAACTTCTCT GGAAAGACAG GTGTGTGAftG 1 
arTTTTTAAA AAGTCATTCA CCAAATTGTG TGTGTGTGTG ^1^^^^ 
SSHUIrAr TTTATTTTTT AGAGCAGTTT TAGGTTCACA GCAAAATTGA 

iii liis = Ess* iii 
1™ asss lis = = MM0 

TGTATCCACC ATTATAGTAA CATACAGAGT ATTTTCAGIO "± 14350 

iiil lis Ss ssss sg= 
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CCCTCAGGTT 
CTTCCTCTTT 
AAATAAAAAT 
AAGCAGAACT 
GCGCCACTGC 
CAAAACAATG 
AATAGGTCCA 
TGGAAAAAAT 
GCTTCGTACC 
GCACCAAGAC 
GAAGAGAGAA 
AAGATTCACT 
GAGAAAACTG 
TCCCCTCATC 
GCAGGTTTCC 
AAAAAAAAAA 
TACAGTAAAT 
TGGAGGGAGC 
CAAAGACACA 
GGTGAACGAG 
AGCAAGCTTG 
AGAAACCTGC 
CTATCAGACA 
GGGCGCAGTG 
GGAGGATCAC 
AGACGTTGTC 
TATTGTATAC 
ATTATGAGGA 
TATTTTATCC 
TAAATTGACA 
TCTTTGAATA 
GGCGTTTTTT 
GGCACAAGCA 
TCCCACCTTA 
CCCAGCTAAT 
GGCTGATCTC 
AAAGTATTGG 
TAGACTTGGC 
TGTAGACAGG 
AGGGGGATGG 



CCTAGAAGAT 
TAAACAATGA 
TAAACAATAC 
GCTTCAACCC 
ACTCCAGCCT 
TGATTTCCTC 
CCAGGAAAGA 
AGTTATACTT 
TTGGCCAGAG 
AGACTTCCTA 
GTTACTGGCA 
CTATATTTTA 
TTATTTCTCA 
TGACCTGTGG 
TTATCATGAT 
AAAAAAAAAA 
ATTAATAAAA 
AAGTGGGTAG 
ATGATAGATT 
ATCTGTGGAC 
TCAGGGGATT 
CCTAGGGGGC 
TCAAATGGAA 
ACTCACACCT 
TTGAGCCCAG 
TCTATTTTTT 
ACCACTGAAT 
ATATTTGATT 
AGTTATGAAG 
GAATAGTAAT 
CCAGGTTGGA 
TAGACAGAGT 
TGGCCCACTG 
GCCTCCTGAG 
TTTTTTATTT 
AAACTCCTGC 
GATTATAGGC 
CCTTTCCCAC 
AAACTGTCCT 
TCGGTGGGGG 



CAGTCCTTCA 
TTCCCTTTCT 
TGCCTGTAAT 
GGCAAGCAGA 
GGGAAACAGA 
CTCTAAGTCC 
AGGAAGTAAG 
TCTTGCTTGT 
GCTTGTCTCC 
ATTTTCGATC 
ATCTCAAGTC 
ATTAACGTCA 
CACCTAACAA 
AGGAATCTGA 
GTTTGTCATG 
GGCGTCCTGG 
CAGTGATTGT 
AATCGCGTCA 
GAAGGATATT 
TTCTGGGCTC 
CTGATATTGA 
CATGAAAATT 
GTTAAATCGT 
GTAATCCCAA 
GAGTTCGGGA 
AATAATTTAA 
TATAATAATG 
ATTTCATATA 
TATTTAGAAC 
CAGAGAAAAT 
GTTGTTTATG 
CCCACTCTGT 
CATCCTTGAC 
TAGCTGGGAC 
TTTGTAGAGA 
ACTCAAGTGA 
ATAGCCACCA 
CAGTCATTTG 
TTGCTCATCA 
AAACTGGGGT 
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ATTAGATTCA GATTGAGATG 



ATCATGCCCA 
CTCAGCTACC 
AGTTGCAGTG 
GCAAGATTCT 
TGCACAGGGA 
AATGTTTGAC 
CTTCCTAACA 
TGCGTACCTG 
CCAAGAAGGA 
AACCAGGGTG 
GTCCGTCATG 
TTAATGAGAT 
ACAAGAGGAG 
TTCAGTGTGA 
ATATAACTGA 
AGCTGAAGGA 
AACTAAAGAG 
TATTCTAAAT 
CAACGTTAGA 
AAGGCTGTGG 
TGTCCTGTCT 
ATCTTAACAA 
CACTTTGGGA 
CCAGCCTGGG 



ATAAGAAAAC 
CAGGAGGCAG 
AAGTGAGATC 
GTCTCAAAAA 
AATGTTAAGA 
TAGATTGTCT 
GTTCTCCAAA 
AGGTTTGGTG 
ATCAACCTTT 
AAAATTTTTA 
AGAATGCTTT 
TAACTTCCTC 
GAGGCAGTGG 
GGCCTCACAA 
GAGCTCATTG 
TAGAACTGCT 
CATTTCTAGC 
ATAGAATATG 
TTCTGATTTT 
CCTTCACCTG 
TTCAGAAGTG 
TTACTAGGAT 
GGCTGAGGCA 
CAACATAGAG 



ATCCTCTGTG 
TGTGCAGCTT 
GCCAGCACCC 
ACAGGATGGT 
GTCTGGCTCT 
TGCAGCCTCT 
AGTAGCTGGG 
TTTTTAGTAG 
CCTGAGCTCA 
CAAGCGTGAG 
GTTGTGGTCC 
CAATGGTTCC 
CACTGCAGAT 
AACCTTAAGA 
TTCCAGCTTT 
AGTGTCACCC 
TCCGCCTCGT 
GTATTTTAGT 
TCCTGACCTC 
ACAGGTGTGA 
CCTTTGAATT 
GTGTATGTTC 
GGTAATTCAT 
CTCTGCTCTC 
ATAAATAGCT 
AAATAGACCT 
GATCCATCCC 
CAGGAGCAAT 
CACCTACTCA 
AAGATATTTG 
AAGTTGTATA 
AAACTCAAAA 
CTGACATTGC 



AGCCCAGGGA TGGATGAGGC 
TCCAGAAAGG AAGTCATCAG 
GGCAACCCTG CTGTCTTGTG 



GTGGATTTTT 
GTCGCCCAGG 
GCCTCCCAGG 
ACCACAGGCA 
AGGCAGGGTT 
AGCTATCCAT 
CCACTGTGCC 
CAGAAGCTCT 
CACCCATCCT 
ACAGTCCATG 
GAACTGCACT 
TTTTTTTTTT 
AGGCTGGAGT 
GGGTTGAAGT 
AG AG AC G AGG 
AAGTGATTCG 
GCCACTGCAC 
GTTAAATAAC 
GATATTTCTT 
GAGCCGGCAA 
CTACCTCATA 
AACTAAATAA 
TCTAAATGAT 
TCCTGATGTG 
TGCTACTCCG 
AGTAAGAAAT 
AATAGGTTGG 
AAGCTGAAAA 
CTTGCTTTTT 
TTAGTAGTCA 



CTTTTTTCTT 
CTGGAGTGCA 
CTCAAGGCAT 
CACACCACCA 
TTACTATGTT 
CTGCCTTGGC 
TGACCAGGGT 
GATGGTACCA 
ACCTCCCATG 
TAAAACAAAT 
ATGTTTTCTT 
TTTTTTAGAC 
GCAGTGACGT 
GATTCTCCTG 
TTTCACCATT 
CCCATCTCAG 
CCGGCCAGTA 
TTGTAGCTAT 
AGGAAACCTG 
ATTTGACATG 
ACCAGAACTT 
ATATATGAGA 
CTCTTCCACT 
GAGGAGAAGT 
AGAACACTAC 
GAAAGGCACC 
ACTCGGGCAC 
TACTGAAGCA 
GGTTTTTTTG 
CAGAATGAAA 



AGAGAAAAAA 
TGTATATAAT 
TTATATCTTT 
AATTCATCAG 
AGAAAAAGAC 
GGTTTGTTTT 
TGCCCAGGCT 
CTCTTGGGCT 
CACAGGTGCA 
CAGTCTTTCT 
TCCCCCTGCC 
CACCCAACCT 
TGTCCAAAAG 
GTTTTCTTCA 
TATGCAAGTT 
ACCAGCCGCC 
CCAGTCAGCC 
ATAAAGAAAT 
TTTTTTTTTT 
ATGGCGGGAT 
CCTCCCACCT 
CGCCCAACTA 
GTCCAGGCTA 
CTCCCAAAGA 
GGATTTTTTC 
AATTCCAAGC 
ATGGCAAGAG 
TGCTATGGAT 



AT ACT G AAAA 
GTATATATTC 
TCCTTCTGTT 
TAATTGGGGC 
AGATGGGTTA 
TTGTTTTGGG 
GGAGTGCAGT 
CAAGCAATCT 
TGTCACCACA 
ATGTTATCCA 
TTGGCGTCCC 
AGTTTCTATT 
ATCTCATAAA 
TCCTGTGTCT 
CCTCTGAAAC 
AGCGAGT C AG 
GGCCCTGGCA 
GGTCTGCCTG 
TTGAGACAGG 
CTTGGCTCAC 
CGGTCTCCCG 
AGTTTTCGTA 
GTCTCAAACT 
GCTGGAATTA 
AAGTGCACAT 
GAAAAAAAGT 
GAAATCACCA 
TTTGAAAGTG 



TCTAACAATG 
TGAATATATT 
CAT G AAC AT C 
CCAGTATGCC 
AGACAGTCTT 
TTTGCAAACA 
TAAAACAGTC 



ACCTTCAGTG 
CCTCATGTAA 
ATTCATACCT 
TTGGTTTAGA 
GACCACCAAG 
CAGGGCTAGC 
TTCCAGTCTT 



CTCTAAAAAA 
AATTAAAATA 
TGAGGTCCGT 
GCATTAAGCA 
CAGCATTCTC 
TCAGCTACCC 
CAAATTAGGA 



CATTAGAGTT 
AGTGTCTCGC 
GATCTCGGCT 
CCTCAGCCTC 
TGGCCAGGCT 
CCTCCCAAAG 
ATTTCAAGCT 
GTCCAACATA 
CCCTTGGTTG 
TGTTACAGAA 
AATTATCCTG 
TTTCAGTCTG 
TGCAGATATT 
TACGGTTGGA 
CAGAAAAAGT 
CTAGAGATGT 
CAATCTAGCA 
TTTCCCAAAT 
TTTGTTTGTT 
GATAAATCAA 
CTACGGAGTC 
CAGACATATA 
CCCCCTCCCA 
GGAGGGCCCT 
TTTTTGTTTC 
ATTAGTATGT 
TGACATTGTC 



CTCTGGTAAT 
TTTGTCGCCC 
CACTGCAACC 
CTGAGTAGCT 
GGTCTCGAAC 
TGCTGGGATT 
TCTGAGGAGC 
TCCATGTTCA 
TTTTCTTTGT 
TATACCTTTT 
CTTTAGTCAC 
CTCACTGTGA 
TGCAAATATG 
ATGGCCCTAC 
TCAAGAACAG 
TCCAGCCCCA 
AGTCCTACGG 
GGGAAATCCT 
TTTTCTTCAT 
TCATTCATGA 
AAGGAAAACA 
AAGGGCAAAA 
GAAATAACCC 
GAGTCACTCC 
CTCTGTGGCT 
TTTCAGTCAC 
ACATGGGGCT 
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TTAAAGCAAG 
ATCTCACTCT 
CTGCAACCTC 
ATTCATTATG 
TACCCCCGAT 
GCCACCACGA 
ACCATGTTGG 
TCCACCCACC 
ACTCCTGGCC 
TTTGATGTTT 
GAAATAACCT 
AATGATCTTG 
AGTATATTAT 
TGGAGTTCAA 
ATATGGTACA 
ATGCTAATAG 
GAGGGGGATA 
AATTTGGCAT 
TACCAGAGAA 
TCTGCTAGGA 
AAAACACATG 
CATTATACCA 
GATCTTTCTG 
CATAGTTCCT 
GGTTAGTTAG 
ACAAACACAG 
AATTTTCAGT 
AGTTTGGGCA 
GGATTCCACA 
TGACATGGGG 
AAGGACATGT 
CCCAGGCTGG 
CTGAGGCGGG 
ACATAGTGAA 
GTGGCGGGCG 
CTTGAATCTG 
CGCTAGCCTG 
AAAACAAACA 
GATACAGGTA 
TAAGTAGAAG 
TAAACCTGTT 
GTGTGACTGC 
AGCTAACTCT 
TTCTTCCGTT 
CATGGTACAG 
TGGTTATCTT 
AGAGTCTGGA 
GTTCTTCGTA 
TTTTGTTTTT 
TGTAATCCCA 
GGAGTTCAAG 
AATACAAAAA 
TGCAGAGGTG 
AGTGAGCCAA 
CTCTGTCTCA 
ACTCAGTCGT 
TATAGGGGGT 
TGGAAGAATG 
AGGTTTTTTG 
ATTTAGCTGT 
ATTTCAACCT 
ACTTTAGATC 
TCATATTATA 
TGATTTCTGA 
CCTATAATGT 
GTAATGGTAC 
TTATTTGAAT 
TAGGCTGATG 
ATCCAGTAGA 
ATGTCTGTCA 
CAGCAAATAT 
GTGGTCTTTA 
AGAGTCTTGC 
CACTGCAACC 
AGTAGCTGGG 
TTTTAGTAGA 
TTGACCTCAG 
AGGCATGAGC 
TATAGTTATT 
CCACTTTTAA 



TGAAACAAGG 
TGTCGCCCAG 
CACCTCCCAG 
AGGAATATTT 
CATATTATTG 
CCGGCTAATT 
CCAGGCTCCA 
TCAGCCTCCC 
ACAATCCTTT 
ATACCCAACT 
GCTCAGATAC 
AAGTTACTAT 
TTTAATTAAT 
TGTATCAGAT 
GAAAAAAATG 
CTAATGTTGT 
TACTCTGACA 
GAGGCAGGGC 
CCACAGAAGT 
GAAGACCCAG 
TCCCGGAAGA 
ATGTATCTTA 
TTACCAAAAT 
ACACCAGGGG 
TGTAAGTCCA 
AATTTTATTT 
TTTCATGGGC 
CCTACTCATT 
AATTGTTCTG 
ACATACCACA 
TCATTGCTTA 
GTGCTGTGGC 
TGGATTACCT 
ACCTCATTTC 
CCTGTAGTCC 
GGAGGCAGAG 
GGCGACAAAG 
AACAAAAAAC 
AGTTTTCTAA 
ATGACAAAAA 
TGAGCAGGAA 
AG AAAGG AT G 
TTGTACTTCC 
GCCTACACCC 
TCCAAGGGAA 
CATTCCAAGG 
AAGGATTGGG 
TTTTGGGGAA 
TTTTTTAAAG 
GCACTTTGAG 
ACCAGCCTGG 
CTAGCCGGGC 
GAGGCAGGAG 
GATCATGCCA 
AAAAAAAAGA 
CAATAGCCTC 
GTATAATAGA 
AAGAAATGGA 
AAAATGCTAT 
AAGGGTTTTT 
TGGTTTTATG 
CATATCTGAG 
GTCTATAAGT 
TCCAGGGCAC 
GTGACTAAAG 
TGCCACCAAA 
CTCAGTTTCC 
ATCCTAAAGC 
ATGCTGGGTC 
TTCCTTGATG 
CAAAATACCT 
TAGTTAATAT 
TCTGTTACCC 
TCCACCTCCC 
ACTACAGGTG 
GACGGGGTTT 
GTGATCCACC 
CACTGCACCC 
CAAGTAATTC 
GGAGAAAGGG 



AACCCCCTTT 
CCTGGAGTGC 
GTTCAAGAGA 
GATTATTCAG 
ATTATTGAGT 
TTTTGTATTT 
GGCTCGTCTC 
AAAGTTCTGG 
TTTAACTATG 
GAGGGATGAT 
TTCAAGCTCT 
ACTTTGTTTG 
TATCTTTGTA 
GGATTTCAAA 
TGATCCATAA 
CCTCTAAAAA 
CTTTAATAAG 
CATTTCAGAT 
AAGGCCACAT 
TTGTATTAGG 
TATAGGTGAG 
CATTTCTAAG 
GGAAGGTTTC 
AAATGCCTTC 
GCCACCCTGT 
GCATTTGTAA 
ACGTCATGGA 
GTAGTGACAA 
AACCTGTGGC 
AAAGAAGAGG 
GCTAGTGGCC 
TCACGCCTGT 
GAGGTCAGGA 
TACTAAAAAT 
CAGCTACTCA 
GTTGTGGTGA 
TGAGACTCTG 
AACAACAACA 
CACAGGTCCT 
CATTTGTCAT 
AAGGAAGCAA 
ACTCCCTTAT 
TCTTCTCCTC 
AGGCCCACAT 
AGATCTGCCA 
ATCTCTCCAC 
AATAAGATAA 
GGAGTAGGCT 
TAGATGTGGC 
AGGCTGAGGC 
CCAACACAGT 
TTGGTGGCGT 
AATCACTTGA 
TTGTACTCCA 
GAAAAGAAAA 
TATTCCAGGA 
ATTTCGAGCT 
GGAAGGGTAA 
AATCTTTGTT 
TGTGATTTAC 
GCGAAGGCAT 
GTTCCTGTCG 
GGGAGAGTTG 
TTCCTACAAC 
CAGGTCACTC 
CGGCTGCGTG 
TCCTAGAAAA 
CTCCATACTG 
ACAGGACTCT 
GTAGGCAGCA 
AAAGCAGCTT 
TAAATAGTTA 
AGGCTGCAGT 
GGGTTTGAGC 
CATGCCACTG 
CACCATATTG 
TGCCTCAGCC 
AGCTTAAATA 
AGGCCAAAGA 
TGTAAGTTTG 
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TTTTTTTTTT TTGAGATGGA 
AATGGCGCAA TCTTGGCTCA 
TTCTCCTGCC TTAGCCTCCT 
TTCCTGTAGG GTAAAGATAT 
AGCTGAGATT ACAGGTGCCT 
TTTAGTAGAG ACAGGGTTTC 
GAACTCCTGA CCTCAGGTGA 
GATTACAGGC GTGAGCCACC 
AAATATATTT TTATCTGAAG 
GTTCCCATAT CTCAGTTAAA 
TCTTTTGACT TTTGAAAATA 
GGTTAGTTAA CATTATTTAA 
AGATTTTACT GTATACTACC 
TTTATGTACA TTTTTTATGT 
GAAATCAGAA AATAGCGCAT 
ACTTATTTTT GCATTTTTAA 
TGTAATTAAT TATTGACTGG 
CCCATTAAAG GAATGACACA 
TTGTAATAAA TCATTATAGC 
TAATTAATGG ATTTGCTCTT 
TCTTGGGGGG CCGCATTAAA 
AAAGTTTTAC TACTTTACAG 
CAACTCCAGG ACTTGGCTTT 
CTTTGCTAAC TATGCAACCA 
TGGCAATGCT AAAAGGTACA 
ACATTTGATT TCTGGCTCGA 
AACAGAAATC TTCTGTGTTT 
ATATTTCAGA AGCCAATAGG 
TGAGACTGGT AATGGCTGAG 
TAGCAAAAGG CTGCTGAGAT 
TGCACCCTTA AAACACATGT 
AATCCCAGCA CTTTGGGAGG 
GTTCGAGACC AACCTGGCCA 
AC AAAAAT T A GCCAGGCATG 
GGAGGCAGGC AGGAGAATTA 
GCCGAGATTG CGCCACCGCA 
TCTCAAAAAA ACAAAAACAA 
AAAAAACGGG TATCCCAGAA 
CTTGTATGGT GCGTTCCACT 
GAGAATATAG ACTCACATTT 
TGTTACAGAT GTAATTCTGG 
TAAAGTAGTC ATCCTGAGTG 
CTGTTCCCCT CATCACCCCA 
TGGATGCTGA CATAGACTTA 
TTTTTTTCAA TGTGTCATCT 
TCTTTATACA GTAAGAGATG 
TGAATTGTAA GTTTTAAATT 
AGGTGGTCCT TCTGTTTTTT 
CAGACGTGGT GGCTCACGCC 
AGGTGGATCA CTTGATGTCA 
GAAACCCCGT CTTTACTAAA 
CCACCTGTAG TCCCAGCTAC 
ACCCGGGAGG TGGAGGTTGC 
GCCTGGGCGA CAGAACAATA 
GAAAAAAAGA AT GG AT T T G A 
GATGTTACAG TTGATTATGT 
ATGTAAATTC CAAGTGCATT 
AGTATGAGTG CAAGCATTCC 
CAGGGCTAGT ACAAAGTGCT 
AGACAGTTTT CACATGTGTC 
GTGATGGTGC TTGTCCCAGG 
GGCAAAGATA TTACCCCTGA 
TGCCTGGAGC TCAAGTCTTA 
ATGATTTTGC AATATAAAAG 
ACCCCTTGTA ACAGACTCTA 
ATATTGGGCA AAGACTTACC 
ATGAGGGTGG AGGTTAAGCA 
CCCTAAACTG TGGCTCTAAG 
AGGGAGCTTT TCAAACCCAA 
GTTTATGGAA GTGGGCGACA 
GCAAGAGTTG TTTCTGCCTA 
ATTTTTTTTT TTTTTGAGAC 
GCAGTGGCAC AATCTCGGCT 
AATTCTGTCT CAGCCTCCCA 
CACCCAGCTA ATTTTTGTAT 
GGCAGGCTGG TCTCGAACTC 
TCCCAAAGTG CTGGGATTAC 
GCTAATATTT AATATTATTC 
CTTAGAAACA AAACAAAAAG 
CC AG AT AG AT AGAGATCTTT 
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CTTTTTTAAC TACAAGAGTT 
AT AG AT AT AC ATGAAAATTG 
TTAAAGACAA CACTTAAAAT 
TAGAACAGCT AATGGTTTAA 
GGCACCTTAA TATCGCAGAA 
AATACCTGTA TTTTGATTAT 
AATAGGTCCA ATAGTAATGC 
GCAAACTTAA AAGATCCTAC 
GAGTTGAATT TCAGATAAAT 
CTTTACTTTT TTTTTTGTTT 
GGGTCTCATT CTGTTGCCCA 
ACTGCAGCCT TGACCTCCCT 
CCAAGTAGCT AGCTGGGACT 
TTTGTGTTTT TTGTAGAGAT 
TGAACTCCTG GGCTCAAGTG 
GGATGACAGG CATGAGCCAC 
AATGGTTACA TAGGACATAC 
TCAAGTTTAA CTAGGTGCCC 
TACCCATGCA TTCACTGGTG 
AACCATAGTC CTATAACTCT 
ATTTTAAATT AATAAATAAT 
TATAATTAAA ATTATCAAAA 
ATTTAGATTA TGAAGAGTGG 
TGCATGTGGA GCACTGAGCT 
GTCACTTGAA CAAAACCTAA 
TGGGATTTCA TTCAACAGCT 
AATTTGTCCA ATTTTGTTGT 
AACTAGAATT TCTTCAGTTT 
GCGAATCTGG AGGCCTTCAT 
CGTTTTTCTT TTAGGAAGCT 
GCTCAGGACT GGACTTGATC 
GATTTGCAGT GGAACAGTTC 
TTCCAAGGGG TATAACATTT 
GGGAACAATT CATTAATAAG 
TTTCTTTTTC TTTTCTTTTT 
TGCCCAGGCT GGAGTGCAGT 
CTCCCAAAAC GCCATTCTCC 
AGGCACCCGC CACCGCGCCC 
TTTTTTTGCA TTTTTAGTAG 
GTCTTGATCT CCTGACCTCG 
TGGGATTACA GGCGTGAGCC 
CACTTTTTTT TTTTTTTTGA 
AGTGCAGTGG CGCCATCTCG 
CGCCATTCTC CTGCCTCAGC 
CCACCACGCC CGGCTAATTT 
CCGTGTTAGC CAGGATGGTC 
TCGGCCTCCC AAAGTGGTGG 
AACACTCTTT TTATTATTAG 
CAAGTGCTCA ACAATGCAAC 
CTGTATTTAT TCCAGAACCT 
TGAAGTGAGA ACCAGTTGGA 
TTGAGATTTT CAGAATCACT 
AGCAAAACTG GGAAGTCAGC 
GTTTCTCAAA TGTGTCAGTT 
ACCTGCCCAA GCGGTCTAGA 
AGCTCCTGAC TGTCTCCTTC 
TTTTAACTAT AAGTATTCAT 
GCTTTTTCCA CATATCAGCC 
ATGTAGTAAT AGGATAAGCA 
TTTTTTTTTT CAGACAAGAT 
GGCGTGTTCA TAGCTCAATG 
•CTCACACCTC AGCCCCCTGA 
TTTTTTTCTT TTGTCTGGTT 
CTCAAGTAAT CCTCCTGCCT 
TGAGCCACTG TGCCCGGTCT 
ATT AG AT AT G GAATATAGTC 
ATTACCCTCA TTATTAACTT 
TTATACAGTT AAAATTTTTG 
CTATGAGTTT TACTTTACTT 
CTCTGTCACT CAGGCTGGAG 
CTCGACCTTC TGGGCTCAAG 
ACTACAGGCA TGCACCACCA 
AACAAGGCTT TACTATGTTA 
GGGATCCTCC TGTCTCAGCC 
CATAGCGCCA GACCTGGTTT 
TGTAATTTGG AAAATGTTTT 
TTTAAATACA ACATTTCTCG 
AACCTAACAG TTTCCTTAAG 
TTAGGAGAAG ATTTTATTCA 
CAAAAATGCA AAACTCTATG 



CAGGAATGAA TTACTCTTTA ACAAACGACT 
GAAGGACTTA TTATGCATAT GATAATCAAT 
TATATTGTTG CCACTCTCAA AAAGTGGTAA 
AAAGCAGAGT ACAGAAGTTC CCAAACTTAT 
AACTTTTTAA AGCATGCCTA GGCCACAAAA 
TAAATTGTAA GGTCTACACA ACCTAATAGT 
TGTCCAATAG ATGTTGATGT TTTTTTCCTT 
AGTGCCTCTG TAAATAGCAC TGCCTGGTTA 
AATTTTTTTC ATGTTAATTA TTTTTCTTTT 
TTTTGTTTTT TTGTTTTTTT TTTTGAGACA 
GGCTGCTGTG CAATGGCATG ATCATGGCTC 
GGGCTCAGGT GATCCTCCCA CCTCAGCCTC 
ACAGGTGCTT AC CAT CAT GC CCGGCTAATT 
GTGGTTTTGC CATGTTGCCC AGGCTGGTCT 
ATCCGCCCGC CTCGGCCTCC CAAAGTGCTA 
TGCACCTGGC CCCTGGGCGA AGTATTTCTT 
ACTAAACATT ATTTATTGTC TATATGAAGT 
TGCACTTTTA GTTGCTAAAT CCTGTAGCTG 
CTCCCCAGCT TGCCTTGCAC AGAGTTTGGA 
AGGCCAATTT TTTAATGTAA AATTTGATTC 
AACAGGAATT TTTTTAAAAA TTGTTTTAAA 
TATTTTTTAA CTGAACTTGT GACTAGAGAT 
GGTTTATGCT AACTAATGAC AGTCTGGCTA 
ATAAATTGTG GCTTCCCCAA TTCTCCTGAT 
GTGTCAGACC AGAGCTTCTG GTATCTTCCA 
GGAGCAAATG AAGTCAGATT GATTTTTTTT 
CTCAAAAACA TAATTATAAT CATTTATTAG 
AACAACAGAA ATAGTTATTC ATTATGAAAA 
TGTGGTGCCA ATCTAACCAT TAAATTGTGA 
CTGTAGATGT GCTATACACT TTTGCAAACT 
TTTGGCCTAA ATGCGTTATT AAGAACAGCA 
TAATGCTCAG TTGCTCCTGG ACTACTGCTC 
CTTGGGAACT AGGCAATGGT GAGTACCCCA 
GAGATTCCCC ACTAGCATTA TTTCTTTTCT 
TTTTTTTTTT GAGACAGAGT CTCGCACTGC 
GGCGCCACCT CGGCTCACTT GAAGCTCTGC 
TGCCTCAGCC TCCCGAGTAG CTGGGACTAC 
GGCTAATTTT TTTTTTTTTT TTTTTTTTTT 
AGACGGGGTT TCACCGTGTT AGCCAGGATG 
TGATCTGCCC TCCTCGGCCT CCCAAAGTGC 
ACCAGGCCCG GCTAGCATTA TTTCTTATGA 
GACGGAGTCT CGCTCTGTCG CCCAGGCTGG 
GCTCACTGCA AGCTCCACCT CCCAGGTTCA 
CTCCCGAGTA GCTGGGACTA CACGCACCCG 
TTTTGTATTT TTAGTAGAGA CGGGGTTTCA 
TCTATATCCT GACCCCATGA TCTGCCCGCC 
GATTACAGGC GTGAGCCACT GCGCCCGGCC 
CAAATATACT TCTGCCTGGG CACATTCTTG 
TTTTGGAAGT GCATGTGGCA GAAACTCCTG 
ATTATTGCTA ATCCCAGTTT ATGTTACATT 
GCCAGCAACG TTCCCAGCTC CAAAGTTCCC 
TAACCCTATT ATGCTTGGCA ACCTGGACTC 
AGTTTGTTTT ATTCATCCCT TCCTTTCTCA 
AATCTCAGTA ACCCCATTGC AACCTTCATT 
ACTTGCCAGT ATAGAATCCT ACGTGGGTCA 
TTCACTCTTT TTTTGCAAAG AACTTGTAAA 
GATTCGCCAC ATTTATTCAA AACATAGAGT 
AATGGAAATA AGGATTAAAT GGGAAATGAA 
CAAGTCTTCT TCCTGCTCAA ACTTTTTTTT 
CTTGCTCTGT TACCCAGGCT GGAGTGCAGT 
TAACCTCCAA CTCCTGGGCT CATGCAATCT 
TTAGCTAGGA CTACACTATG CCTAGCCAAT 
GTGTTGCCCA GGCTGTCTCG ATCTCCTGGC 
CGGCCTTCTA AAGTGCTGGG ATTATAGGCA 
CAAACCTTTT TTTCCAAAGT AAATGAAGTT 
TAGTTCCCAG ATATCCATAT CCATTGGTTT 
CAAATTGTTT AATAGACCCT CATATCTCAG 
TTTTGTTTTT CTGGAGTATC TTATTTATAA 
ATTTATTTTA TTTTTTGAGA CAGACGCTTG 
TGCGGTTGCG TGATCATGGC TCACTATGGC 
TGATCCTCTC CCTCAGCCTC CCAAGCTGAG 
CATCTAGCTA ATTTTTTTTT TTCCCCATGG 
CCCAGAGTGG TCTCAAACTC CTGGCCTCAG 
TACCAAAATG CTGGGATTAC AGGCATGAGC 
TACTTTTCTT GACTTTGAAT TACAAGTTTT 
GTTGCTTTTA AATACTGCTG TATGTTTGCT 
ATATATATTT TGAGAATTGC TGTCTTTCAG 
AAGGCTGATA TTTTCATCAA TGGGTCGCAG 
ATTGCATAAA CTTCTAAGAA AGTCCACCTT 
GTCCTGATGT TGGTCAGCCT CGAAGAAAGA 
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CGGCTAAGAT 
TTTTCTTCTT 
GAGTGCAGTG 
AAGCAATCCT 
ATCACCACAC 
CCCTGTGTTG 

TCCACCTCAG 

CCAGCCACCA 

TCCTCAATAG 

TTACCAAAAA 

GCAAATGTCA 

TTACCTGATC 

CATGCGTATA 
ATTTTTTTTC 
ACATGGCATC 
CTAGCTTTAT 
AATTGACTGC 
ACTCCAATTT 
TAGTGGTTTC 
CCATGGAACA 
ACCCAGGAGT 
GATTTCTAGT 
TCTTGGTCTT 
TGGGAGAGGT 
AGTAAAATTT 
TATTTGAATG 
ATTGGACATT 
TTTTAAACTT 
GTTCTAAATT 
AACAAGCACT 
TTACCCCATG 
TAGAATTAAC 
CTTCCACAGC 
ACTATGAACA 
AAGTTTTGAA 
GCTTTTTTGC 
GCACCAGGCC 
GGAGGCGGAG 
AGTGAAGCAG 
TCTATTCTGC 
TCAGTGGCTG 
ACTAAAGCTT 
AGGATGAATT 
GTCAAGTAGT 
CCGGATATGG 
GAGGATTGCT 
CCACTGCACT 
CTCTGTCACC 
CTCTGCCTCC 
TGGGACTACA 
GAGACGGGGT 
AAGTGATCTG 
GCTACCACGC 
TTAGAGCATA 
CTATAATTCA 
ACCAGAGGGG 
CCAACACCAC 
TAACTCTATT 
TTTTTTTCTT 
TTATAAGCCT 
GAATAAAGAA 
TCGGGGAGAA 
AGCTTAGGTG 
CTTTTTCTTC 
GGAATTGTCT 
TAGCATGGTA 
GGGAGGGATT 
TTAAATGACT 
CCTCCCTCAA 
TGGTCCTGTA 
CCAATTATCA 
AGCCAGCTGT 
CCGAACATTC 
TTAGGAAGTG 
AGTGGAAGTG 
AACTGGTTGG 
GTTCAGGGTC 
TGATGTTATC 
AGGCTGCATT 
AGTCCTGCAA 



GCTGAAGAGG 
TTTCCTTTTG 
GTACAATCAT 
CCCATCTCAG 
CTGGCTACTT 
CCCAGGCTGG 
CCTCCCAGAG 
CTTTTCTTAA 
TCCACATGTT 
AAGGAAATTT 
CCTATGATAA 
CTAAAGCAGT 
TTGTGCATAT 
TAGCTTCCTG 
AGTAAGTATG 
TTATTACCTA 
AGTTCAAATA 
TAATATTAAT 
TATAAAGATC 
TATAAGTAGC 
ACATGTCCTT 
TACTTGCATA 
TCCCTAGTAG 
AAGAAGGAGA 
GTTATTTTTT 
GACGGACTGC 
TTTATTTCAT 
TTTAATGTAA 
CTATAGGTAT 
ATGACTTATC 
TACGTGATTA 
TCACATAGAT 
CTACTATTTC 
TATTTTATAA 
TGCTGTTAAT 
AATTACCATG 
TGGCAAGAAG 
CGCCCTTGCT 
CGCTGGCCTT 
TGAAATAGCT 
AGCCCCAAAA 
GAGGGACATC 
GTTTCAGAAA 
CCTTACTCTA 
TAGTTCCCTG 
TGAGCCCAGG 
CTAGACTGGG 
CAGACTGGAG 
CGGATTGAAG 
GGAGTATCAC 
TTTGACATGT 
CCTACCTCAG 
CCGGCCACAC 
TTACAGCTTT 
TAGATTCCCA 
CTATCATTAA 
AAACTTGATT 
AGTGCTTTTA 
TTCTCTTCCA 
AGAATACATC 
TGGAGATGTT 
GGGGGATAGA 
CAATTCTGCT 
AGCCCTCACA 
ATAGAGGTGG 
ATAGTCTTCT 
CTGCTGCTGC 
TATTTATAAT 
AGATCAATAA 
ACCACCCAAC 
AGACAGGGGA 
GCAGGAGACC 
GAGGATCAGA 
GAGAGTGCTG 
AGGTTTTCTT 
GCCAGATTAC 
TGCAAGATAT 
CCCAGGAACA 
ATCCCTAAAC 
AGGTAGACTT 



TAGGAACTAG 
AGACAGAGTC 
GGCTCACTGC 
TCCCACAAAT 
TAAAAAAATT 
TCTCTTGAAT 
TGCCAGGATT 
AAAAAAAAAA 
ATTAAACAAT 
TGACGGGTTC 
AATTTGCTAT 
AACCAGCCCA 
ATATGTATTA 
AAGGCTGGTG 
TCTCCTATTC 
GTATTCAAAA 
AGAAACAAAT 
AAAAAAAATT 
ACT T TAT AC A 
TAAAACCAAT 
GCCACTGTGT 
GAATGGACTC 
AACTTCTACC 
ATAAGGTCAG 
TTCTGAATAT 
TACCAGGGAA 
CTGTGCAAAA 
AACCAGAATC 
GTATATTTAC 
CACTGTTAGT 
GAAATTTGAA 
GATAAGAATG 
AATAAAAGAA 
CTATATAGGA 
CTTCAACACC 
GATACTTTTC 
GTCTGGTTAG 
ATCCGACACC 
AGGGGTCAGA 
CCCCAGCCAA 
TTCATGCCAG 
TTTAACAAGT 
TTTTGGCCTT 
AAG AAGT AC A 
TAATCCCAAT 
AGTTTGAGGC 
CAACAGAGTG 
GGCAGTGGCA 
CGATTCTCCT 
CGCACTGGGC 
TGCCCAGGCT 
CCTTCCAAAA 
CCTGTCTCTT 
GTCTCTCAGG 
AGAAGTTTAG 
ATTTAAAGAT 
GCTTTAAAAT 
ATCTATACTG 
TCTTCATTCT 
ACAAATCCTT 
TGTTTTGCCA 
GAAGGAGAAG 
TATTTTACAT 
CATTGTTTGT 
GAATTTGTCT 
AGGATTTGTT 
TGCTGCTGCT 
T GAT G AC ACT 
ACCAGAACCA 
AGGTTCACCT 
ATTGCAAAGG 
AGAGTTTTAT 
GCTTTTAAGG 
GTTGGTCAGG 
GCTGTCTTCT 
CGGTCTGGGT 
CTCAAGCACT 
ATTTGGGGAG 
CGTAATCTCT 
GTCCCCAGGC 
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AGGATGCAGA 
TCACTCTGTC 
AACTTCGACC 
AGCTGGGACT 
TTTTTGTAGA 
TCCTGTGCTC 
ACAGGCATGA 
AGATTCTCTC 
CTGCTGCCTG 
AGAATATCAA 
CAAAATTAGG 
TTTCTAGGGA 
TATGACTGAG 
GAGAAGTGAT 
T T AAT ACT AG 
AGTTAGTTCA 
AGTGTCTCAA 
TTAAGTTATT 
GAAGAACAGT 
TGCTTGCCAA 
TTTTTCAAGA 
CTCCTCATAA 
TTTTTTTAGT 
CAATTAACCT 
TTTCTGTGTA 
GATTTTCTAA 
AGTTTTCCAG 
CTTATTTTAT 
ATGTTTTTCT 
TTTCCCCTTA 
ATATTTCCAA 
GGTTGGTTCA 
AGTTTCCCAA 
GGGGTGGGTC 
ACAGTTGAAA 
TGTTCTATAG 
GAGAAACAAG 
TTTGCAGCTG 
GTGCAGCTCT 
AAAGC AG AT C 
ATTTTGCAAG 
GTTCCAAATT 
TAATTATGGC 
CTGTAAAAGA 
ACTTTGGGAG 
TGCAGTGAGT 
AGACTGTCTT 
CGATCTCACC 
GCCTCAGCGT 
TAATTTTTGT 
GGTCTGAAAC 
TGCTGGGATT 
AAAAAAAAAA 
AGGATACTTA 
AGCCTAAAGT 
TTGTTAAATC 
ACTGGTTTAG 
CTATATCCTC 
TTTTTCTCTC 
TATGCCCATG 
TTAACTAAAG 
TGGGAAGAGG 
TTTACCCCCG 
GCAGGGACCT 
CACCCTGAAA 
ATCATATGGA 
GCATGCAGTT 
TTTCTGGCTT 
GGCATGGTGG 
TGCCTGCTGT 
AGAAAGAGTA 
TATTACTCAA 
ATAATTTGGC 
TTGGAGATGG 
GTTCCTGGAT 
GGTCTCAAAT 
GATCTTAGGT 
GTTCAGACTC 
AATGTTGTAG 
AAGAAGGGGG 



ATCACTTTAC 
AGCCAGACTG 
TCCCAGGCTC 
ACAGGTGCAC 
GATGGGGTCT 
AAGCCATCCT 
GCCACCACAC 
TGGTAGACAA 
AATACATGAT 
GGGATCTGAG 
AAGTTTGTGT 
ATAAAACTCT 
T GAT AAT AAA 
TGATTCAGTT 
GAAAGTAAGG 
TTTAACTGCC 
GTAGCACTGT 
TTAAATAATG 
GCCAATTAAC 
AGAACCAGTA 
CAGAGTAACT 
CTCCCTTCCA 
AACAGGTGAG 
AAAAGCAGAA 
ATTTAGCTAC 
ACCCTGATGT 
GTAATAGTCT 
AGTCTAGCTA 
AAT T T TAG AG 
GCATTGGGTC 
TAGCCTTTAG 
CTTCATGTTC 
GACCTAAATG 
TAGGAATACA 
CCACAGGTCA 
GTGGTTGAGA 
CTCTGCATAT 
GCTTTATGTG 
TCTCCATCCT 
AAAGACCGTT 
AAAATGATTT 
AAT C ACT AT A 
CCATAAATAT 
ATGCATATAG 
GCCAAGGTGG 
TATGATGGTG 
TTTTTTTCCC 
TCACTGCAAC 
CCTGAGTAGC 
ATTTTTAGTA 
CCATGAGCTC 
AC GG AC AT G A 
AAAATGCAAG 
GTGTATGTAG 
ATGAGGTCCC 
ATCTCATTGT 
TTACATTTAG 
ACATTGAGAT 
ATCCTCATTC 
GAAGCAAGAG 
ATCTGGGGTG 
TGTCCATAAT 
CTGACTGCCA 
CATAGGACCA 
GGGATACCTC 
AAG AT GT AAA 
GCCATTTCAT 
CCTGTTAATT 
CATGCACTTG 
C TAG AT AG AG 
ATTTATGCAG 
ATCAGTCTCC 
CGGTAGGGGC 
AATCACAGGG 
GGGATGGCAG 
GATCCACCCA 
T T T AC AAC AG 
TTGGAGCCAG 
CTAATTTGTT 
TCTTTTCAGA 
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AAAGGGCTAT 
TTCCCAAAGT 
AGGTTAGAAG 
AATTTCCTCA 
GGGAGGCTGA 
AGAGCTATGA 
CTGTCTCTAA 
AAGATGGTGT 
AGCTTGGTCT 
CCGAATGGGA 
ACT AC CAT TT 
TATTTTCCTA 
TGTTAACCTC 
AGATGATGAA 
TGTGTGTTTG 
AAGCCATTTG 
ATTTTATGAC 
TACAAAGTAA 
CATTTTTTAT 
TACCATGTGT 
ATAATCTTCA 
AACTGTATCT 
GGAAGATCTG 
TGCGAAGCAT 
CTATTTTCTA 
GCTAAGAAAT 
TGGTTTGTGG 
CAAGCTTGTG 
GTGATCTTGC 
TATAGAGTTA 
TACTCCTGAC 
ACCTTTGCCT 
AAGGTTCTCA 
CACTTGTAGA 
ACTTTATTTT 
TGGATCCATT 
TTTACAGGAA 
AAATCCTGCT 
CATGCTTATG 
AAAAGTGGAA 
ACTTTCCCTC 
ATTTATTTAT 
TTGTGCAGGC 
GCCTCCCAGT 
CGCAGTTTTT 
AAATACAGAG 
CCCCAAAATA 
TGGGAAGAAA 
AAAGCAGGAC 
AAAACAAGAG 
GATGGCACTA 
AACTTGCCAC 
TCCAAAAATT 
TACTTGTTCC 
TAACAAGCTT 
CCATCCCAAC 
CATACTTTGA 
CCTATTAGAT 
ATATTTCAGT 
TTTGGGAAGC 
GCTACGGCAA 
TGTGGTCCCA 
GGAGGTTGAG 
GGTGACAGAG 
CCTTTTTGTA 
ATTCCTTAGT 
GTCTAAAATA 
TATATTACAT 
TTCCTCCCTC 
AATTTGTTCT 
CTATGATATC 
GGTAGTTATA 
TTTTGCAAGA 
TGTGTCTGGT 
TACTTTATTG 
GATTTTTTTA 
TGGAGTGCAG 
TCAGGTGATC 
ACACCACCAC 
CATCATGTTT 



TATCATTTTT 
TAGTTCAGCC 
CAAGATGGAG 
GTTATAATTT 
GACAGGAGGA 
TCACGCCACT 
ATAAATAAAT 
GCAATTAGAA 
TGCTCTGTCC 
ATAGAAGTGG 
AGTGGATGAA 
ATTCTAGTGG 
CTATGAACAG 
TTAGAAGGAG 
AAGAGAAGAA 
CAGTATAGTG 
TCATTATACA 
ATCAAAGTTA 
GTTATGAAAT 
TTGCTTAAAA 
AGATATTTAT 
GGTGCTAAAT 
TATGTCTAAA 
GACCTTGATT 
AGAATAATTC 
TTTGCAAAGA 
ATAGACTTCC 
GAACATATTA 
AAAAATGGAA 
AACTAAGCAT 
CTTTTTATCT 
TACGAACTCT 
GTCAGCTAGA 
TCCAAGAGAA 
ATTTGCATGA 
TTTTAGATAA 
GCCATACTGT 
TCTGAGGCCT 
TCTCCTTTGA 
TTTAAGCAGA 
TACTTTCAAG 
TTATTTATTA 
TGGTCTCAAA 
GTTGGGATTA 
TAAGAAAAAC 
GAAAGTATAT 
TGCCACTTTG 
CACATAGAAG 
ATGAATCTTA 
TTAATCACTG 
GAAGAATCTA 
CCCAGAGACT 
TGCTCTATAA 
CTGGTATTTT 
CTGTTTGTTT 
TAAGAACTAA 
TCTTGTTTAA 
TACTTTGAAG 
ATGTGCTAGG 
TGAGGCAGGA 
CAAAAAATCA 
GCTACATGAG 
GCTGCAGTAA 
TAAGACCATG 
AAAACACAAT 
ATCACCAAAT 
TTGTTGATAG 
TTGGTTGACA 
TCTTTCATCT 
ATAGTATTTC 
ATTTAGCATG 
CCTAGAAGCT 
ATTCTTTATT 
TTTGATCTTG 
GGGTGGGGGG 
ACTGTTATTT 
TGGCACAATC 
TTCTCACCTC 
ACCTGGCTAA 
CCCAGACTGG 



GTTTCAGAGT 
TACACCCAGG 
TCAATGAGGT 
TTGCAAAGGC 
TTAATGGAGC 
GCACTCCAGC 
AAGTAAATAA 
TTGAGCGATT 
CAGGTGGCTG 
TGATGAGGCA 
AACTTCGATC 
AGTAGATTAA 
TCAGTCCTCT 
CCTTAGATAG 
ATCAAGAGCT 
TGGATTTTGT 
AGACAAAATA 
TAATTGCCTA 
TGTAATTTAT 
ATCTCATGCT 
GAATAAAGTC 
CAGGAAATGT 
TATATGTCAG 
TTTATAGTCT 
CTAAAAGAAT 
GCGTACGTGA 
CAACAAAATT 
GTCATCTTTT 
TTTATCTTTC 
AGTAATTTCA 
CATCCAAATT 
TTGTATATGC 
AAAATGTGCA 
TTAGACTTAA 
CAGTCCTGTG 
GGAAGTTCAA 
AGTCCTATGT 
GCATACTTTC 
AAACATTGAT 
GAAACAAAAG 
AAGGAAAGTT 
TTTTAAAAAT 
CTCCTGGGCT 
CAGCATGAAC 
TTTTACTATA 
GAACCCACTT 
GCATAAGGAT 
AAAAGTTCTC 
AAAGTCCCCC 
AAGATAACTT 
TATTACATAC 
AAAAATCCTT 
GCTGGAGTTC 
CTGTTAACAT 
TTCTCCTGTT 
AGAGTAGGAG 
TCCGTAACCC 
CAAATTTCAG 
TGTGGTGGCT 
GGATCACTTG 
AAAACTTATC 
AGGCTGAGGC 
GCTGCATTCA 
TCTCAAAAAA 
ACTTTTATCA 
ATTTTGTCAG 
TTATTCAAAT 
AGTCTCTTAA 
CTTGTAATTT 
CTACATTATA 
TTCCTCTGTC 
TGAGTTTATT 
ATCTGCTTCT 
ACAGCTACTG 
AATAAGGTTT 
TGAGACAGTG 
ACGGCTCACT 
AGCCTCCTGG 
TTTTTTGTAT 
TCTTGAACTC 



152 

CAAACCATGA ACTGAATTTC 
AATGAAGAAG GACAGCTTAA 
CTGATCTCTT TCACTGTCAT 
GGTTTCAGTC CCAGCTACTT 
CCAGGAGTTT GAGGTTGCAG 
CTGGGTGACA GAGTGAGACC 
ATAAATACAT AAATAAAATC 
TTGTTTCCAA ACCTCAAGAA 
GATAAATTGG GCCTGTCAGC 
AGTATTCTTT GGAGCAGGAA 
CTTTACCTGT AAGTGACCAT 
AGTCAACTCA GGACCTCTGG 
CAGTAACTAG CCAAATCATG 
CATCCAATCT AACATTTTTT 
AGGAATAACT TTTTAAAGGT 
TTAAAAGGGG ATAATTTGAA 
AGTTGGATTT TCAAATGTTT 
CAGTACGCAA AGCTTCAAAA 
TTAACCTTAA AATGAGCCAG 
AAGAATTTAC TATGTTGTTA 
TTATTTCTAA TCCTTCCTCC 
TTCTTCCCAA AAAGCCTCGT 
GGATAATACA GATGTAGCCC 
AAAATGTCAT TTGCAGATAT 
TATTTGAATG TTGTAGGAAA 
AAATATAAGC TAGGCTTTTG 
GCTTTTTATC TATAGTGATC 
TTTAGAAAAT TCTTAGAAAA 
CCCAAGTATA TTCTGTCATG 
CCAGACAAAC ATTCAAAATC 
TTCCCAGGGC C GAG AC AT AA 
ACTAAATATG CTTCTCCTTC 
AGAGTAAATG GTACCCTTCT 
ACTCACTCTA CATGTCTGTG 
AGGTGGCAAG GCAGGTATCT 
ATTGAGAAGA GGTTGCATGA 
TACTCTTAAA AATCCCATTC 
TACCCTACCA GTCATTGACC 
TCCACTCTTG TCTCCAGTGA 
CCATTTGTCT TGTTAAGTCT 
GGGGTATGTG TTGAATGGTG 
TGATACAAGG TCTTACTGTA 
CAAGTGATCA TCCCACCTCA 
CATTGTGCCC ACCACCGATC 
GAAAATTTTA ATCATATACA 
TAGGAGACTA GAATATGCCA 
TATTTCGAGC TAAAGGCAAC 
TGTCCTTCTC CATTTGCCTA 
TCCTTCCCTT TCTACCAGGA 
CAGACCCTTA TCAGTGTAGA 
TCATTTATTT TCCTTCCCAC 
TTCCTTTGTC ATGTCTCTTG 
TAAGCCACCT CTTTGAGAAT 
ACATGTATTA ATATACATGT 
TTCTGTCTTG TTACAGAGGT 
GAAAATATAA TTTCCTCCTG 
TTCCCACTTT TCACCTCCTA 
ATATATTACT TTATCTATAA 
CACACCTGTA ATCCCAACAC 
AGCCCAGGAG TTCAAGACCA 
TGGGCATGGT GGCACATGCC 
AGGAGGATCG CTTTAGCCCA 
CACCACTGCA CTCCAGCCTG 
ATACATATTT TAGTATGTAT 
TACTTTAAAT AATAACAATA 
TGTCTCACAT TTTCCTTATT 
CAGAATCCAA ACAAGGTCCA 
GTTTGTTCAT CTTTAAGTTC 
ATTAATGTGA AAAAACAGGT 
GAGTTTGCTA CATTTATTCC 
CCCTGTGTTT CCTGTAAACT 
CAGGTTTTTA ATTGTATTTT 
GGAAGCACAG AATGTCTGGT 
ATGACCATTG CCTAATCCAT 
TAAAATAAAT TTTTTTTAAA 
TCTCATTTCG TTTCCCAGGC 
GCAGCCTTGA CCTCCTGGGA 
GTACCTGGAA CTAC lAGGTGC 
TTTGTGTACA GAAGGGGT1 l 
CTGGGTTCAA GTGATCTACC 



30800 
30850 
30900 
30950 
31000 
31050 
31100 
31150 
31200 
31250 
31300 
31350 
31400 
31450 
31500 
31550 
31600 
31650 
31700 
31750 
31800 
31850 
31900 
31950 
32000 
32050 
32100 
32150 
32200 
32250 
32300 
32350 
32400 
32450 
32500 
32550 
32600 
32650 
32700 
32750 
32800 
32850 
32900 
32950 
33000 
33050 
33100 
33150 
33200 
33250 
33300 
33350 
33400 
33450 
33500 
33550 
33600 
33650 
33700 
33750 
33800 
33850 
33900 
33950 
34000 
34050 
34100 
34150 
34200 
34250 
34300 
34350 
34400 
34450 
34500 
34550 
34600. 
34650 
34700 
34750 



CACTTCAGCT 
CCTAAATGAA 
TTTGTCTTTG 
TGGCTTTGAG 
TTAAACTCTG 
GAATTGTGTC 
ATTCATGGTG 
TTTAAATGCC 
CTTGCCACTG 
CTCTTTCTCC 
CAAGAAATTG 
AGAGAAGGAA 
GTATGAAACA 
AACTTTACTC 
GGCAGTTGCA 
TTTTAGAAGA 
CTCGCTCGTC 
AACCTCCGCC 
ACAACAATAT 
GACATCGAGA 
GTTCCTAAAT 
TGATAAGATG 
ATTTCCGATT 
GAAGTAGCGA 
TAGATTTAAT 
CCCCTGTTTT 
GCCTTTTACC 
CAACCCAGCT 
TTGGTAGCAT 
CACTAGCGGT 
GTCATTATGG 
TACTAATACT 
AGATACATAT 
AAGAAGGAGA 
TACTTGCGGT 
TCTAAGACCT 
TTGTTCATTC 
AGTTTGGACA 
TTCTTAATAT 
ACTCATCCTA 
ATCTTAGAGA 
ACTCAACGCA 
ACATTCACTA 
TGGGTTGGTA 
AGCATTAATT 
TTCTTAAGTA 
ACATCTTATA 
TCTTAAAGAG 
TTGGAAGGAG 
AGTTTACAGG 
GAAGCTGAAG 
GGCAATATGG 
TGGTGGTGCA 
ATTGCTTGAG 
CTGTCACCCA 
TAAATAAAAA 
GCAAATGCCA 
AGGCAGTACA 
ACATTATTTA 
ATT TT AC TAT 
CGGGGGAGAG 
GTAGACATTT 
GAGTCTCACT 
ACTGCAACCT 
CTAGTAGCTG 
ATTTTTAGTA 
TCCTGACCTC 
ACACGAGTGA 
TCATGTTTTA 
CCCTAAATTC 
AAATTTTGCA 
CGCTTACTCA 
AAATTATTAC 
CTCCCAGGCT 
TACCTGGGCT 
CACAGGCTTA 
CGATGTCTCA 
GATCTTCCTC 
ACCCAGCCCT 
ATTTAAATGA 



TCCCAAAATC 
ATTATTTGTC 
TGTGTACATG 
CTTTGCTTTG 
ATCATTCTTG 
CTCCAGTGAT 
GTCACATGTG 
CCCAGGATAA 
GTTTCATTAA 
CCAACCACCA 
GTGGGCACCA 
GCTTCGAGTA 
CACCCTTTAC 
AAACACCCTG 
ATTTAGTAAA 
AAAATGCTAC 
ACCCAGGCTG 
TCCCGGGTTC 
TATTTTCAAA 
TTTTTGTAGC 
CTCAGGAATA 
ATGCTGAAAT 
GTTGGGGACT 
GGGGAATGGT 
TTTCTTATAC 
TATGGTTTAT 
TTCCTATGTC 
GGCAGAGCTG 
GAACGGCAAC 
CTAAAACGAT 
ATCCTAATAC 
TAGGATCACA 
TTCTATTAAG 
TTTAACTCTG 
TACCCTATCC 
TTGGGACCTC 
CAAACTTTCA 
GGGAGCAAAA 
TCAGGAAATT 
AGAGTCTAAA 
ATAAGTTTGC 
TTTAAATTAT 
AAGC AAAAT A 
TAAAATATCA 
TTTATTGGTT 
GATTCTCATA 
AAAGCCTGTA 
TTAGGCATAT 
TTCCTTTCTC 
CTTGGCGCAG 
CAGGCAGATC 
CAAAACTCTC 
TGCCTGTAGT 
CCCAGGGGGG 
GCCTGGGTGA 
TTAAGAGTTT 
CATAAGTGAT 
GTAAGCACGC 
CACTGGGTAC 
AACTATAATC 
ATCCAGAAGT 
TTTCTTTCTT 
CTGTTGTCCA 
CCGCCTCCTG 
GGATTAGAGG 
GAGATGAGGT 
AAGTGATCCA 
GCCACCGTGC 
TAATTGGAAA 
TCTTTGATGA 
AAATAGTATC 
TATTAATGAC 
TATCATTATC 
GGAGAGTAGT 
CAAGTGATCC 
TGCTACCACA 
TTATGTTGCC 
AGCCTCCCAA 
AAAAATTATT 
ACATCTGGTT 



CTGGGATTAC 
TCTAAACAGA 
TGTTTGTGTA 
AATTCTTGGA 
AC AG AT AT CC 
AAAAAGCAGC 
AGGTGAAAAA 
CAGTGATACT 
ATAAGGACAT 
CAACTAGGAT 
AGGTGTTAAT 
TACCTTCATT 
CAATCATCAA 
TTGCATGTGT 
GTTTTATACA 
TTTTGTTGTT 
GAGTGCAGTG 
AAGTGATTCT 
AGTTGTGACC 
CTCATACTCT 
TTCTCTAGAT 
ACTAATTCTA 
GGGAACTCTG 
TTGAATGGAT 
ATTTCAGTCT 
AATTTGAATT 
TGAAAATGGA 
TGAGGATCTC 
ATTTTTAATT 
CATAAAAGAA 
TTAGGATGCA 
TTTGTAATTG 
TTAACCTCTT 
TATGCCATAA 
TTTTTCTAAC 
ATGGATTACT 
ATAAATTTAT 
GACAAAGTCA 
TATGTATGAA 
GCAAAAGGAT 
ATTTCAAAAT 
TTACTCTAAA 
TACCTTTATA 
TACCATGTGA 
AGAGTAAGAA 
CACTTTGGTT 
TTCAATGGAG 
AAATATTTTA 
ATGACTATTC 
TGGCTCATGC 
ACTTCAGCCC 
TCTACAAAAT 
CCCAGCTACT 
TCATGGCTGC 
CAGAGTGAGA 
ACAAAATTCT 
GTGTTCCAGG 
TTTCTCCAAA 
TGCTCTTTTA 
ATATAACATG 
CTTCCCAAGA 
CTTTTTTTTT 
GGCTAGAGTG 
GGTTCAAGCA 
CATGCATCAC 
TTCACCATGT 
CCTGCCTTAG 
CCTGCCCCTA 
ACTGGTGAAA 
GTATATATTA 
CTAGATAAGT 
CTCGGAGAGT 
ATTATTTTTG 
GGTGCGGTCA 
TTCCTCCTCA 
CCTGGCTAAT 
CAGGCTGGTC 
AGTGCTGGGA 
AGGGTCCTGC 
TTTTTAAAAA 



153 

ACTTTGGCCA CCGTGCCTGG 
CAGAAGTTTT ACTTTAAAAA 
TGTGTGTGTG TCTAAAAGTT 
TGAACAATAA CCAAGAATAC 
CCTACAGGCT ATGGCCTTTT 
AAGCACGATA CTGCTCTCAG 
AAAAAAAAAG ATGAATCCTA 
CTTTGTAGGA TAACTATTTG 
AAGTAAAGAT CTATTTTTGT 
TATTGGCTAT CTCTTCTGTT 
GGCAAGCGTG CAAGGTTCAA 
GCACAAACAC TGACAAGTAA 
GTTTTAGTGG GTAAGCCTGT 
CTATACATTG CATAAGTATA 
ACGATTTTAT TTTATTTTAT 
GTTGTTTTTT GAGACGGGGC 
GTGCAATCTC AGCTCACTGC 
TGAAGAGGAG AACAATAATA 
GCAGTTTCTG GAGTTGAGAA 
TGCTTTAGGT AGCAAAAAAT 
AGGTTTCAAT CTATCATTCC 
GCCAAAAAAG ACCAGCTACC 
GATAGTGAGG ACCCCAGTAG 
AAATTCATAA AAAATGTCAG 
TTTTATAAGG CTAGGAAAAG 
CACATGAACC C AC AAAAT TT 
TAGTCTGGCT GGCCTCTTAA 
AGTGTGCTCT AGCCCAGACA 
GTGTTTTCAA AATAGGAGCA 
GGATACTAAG AGGGCCCACT 
TTATGGATTG TCATTATGGA 
AGTTTTTAAT TGCTTAAATT 
TGCTTTTAGT CCAAGGTATA 
ACCTCCATAA TGTCACCAAG 
AAGCAAGTGG ATAAATACCT 
TTCCAAGTAA GTAATTTTCC 
TGGTGTTTAT C AG AAT AG AG 
ACTATATCAA GTTCTAATAA 
TACTTACTAA TATGAGTATA 
GTGAACACAA ACTAGCAGTT 
AACTTGACAT ATCAAGATCC 
AAGACATAAT TCTTGGTAAC 
TAATTGCTAT CAAAGGTATG 
GATCAGTGTG ATTCCTTTAC 
AAAGAATAGC TAGAGTATAT 
TCAAAAACCA ATTATTGACT 
TGCCAAAAAA TGACTATGAG 
AGGTTTCTGT TCAATGTATG 
TCATATTGGA GCATAAAAAG 
CTGTAATCCC AATACTTTGG 
AGGAGTTTGA GACCAGCCTG 
ATACC AAAAT TAGCCAGGCG 
TGGGAAGCTG AGGTGGGAGG 
AGTGAGCTGT GATGGTGCCT 
CCCTGTCTCA AAAAAATAAA 
CACCATCTCC TCCCATCTTT 
ACTATTAGCC TCGGAACCTG 
GTCCTGTCCC CCACAGACAA 
TTTTTTCCCC TCTATGCTTT 
TAATAGGAAA AAGGCAGGGT 
GCCTTTCCAA CATAGCCTCT 
TTTTTTTTTT TTCTGAGACA 
CAGTGGCGTG ATCTAGGCTC 
ATTCTCCCAC CTCAGCCTCC 
CACGCCTGGC TAATTTTTGT 
GGGCCAGGCT GGTCTTGAAC 
CCTCCCAAAG TGCTAGGATT 
TTACATTCTG ATCACACATT 
T T AT AG AC AA TGTTTTGTTC 
CTTACACTCT TCTGTCTTTA 
TTATGAGTGC ACAGTCTGTA 
TAAACAACAG TCACCTTTAA 
AGGCGGGGGT CTCATTCTGT 
CAGCTCACTG CAGCCACCGC 
GCCTTCTGAG TAGCTGAGAC 
TTTTTAACTT TTTGTAGAGA 
TCAAACTCCT AAGCTCAAGT 
TTACAGGCAT GAAAAACTGC 
ATAGTAAGAC TTTAATAAAT 
AAAAATAGAG ACAAGGTCTC 



34800 
34850 
34900 
34950 
35000 
35050 
35100 
35150 
35200 
35250 
35300 
35350 
35400 
35450 
35500 
35550 
35600 
35650 
35700 
35750 
35800 
35850 
35900 
35950 
36000 
36050 
36100 
36150 
36200 
36250 
36300 
36350 
36400 
36450 
36500 
36550 
36600 
36650 
36700 
36750 
36800 
36850 
36900 
36950 
37000 
37050 
37100 
37150- 
37200 
37250 
37300 
37350 
37400 
37450 
37500 
37550 
37600 
37650 
37700 
37750 
37800 
37850 
37900 
37950 
38000 
38050 
38100 
38150 
38200 
38250 
38300 
38350 
38400 
38450 
38500 
38550 
38600 
38650 
38700 
38750 



ACTATATTGC 
GCCTTAGCCG 
GGCTGAGTGA 
ATACATTTTG 
TCCTTCCAGC 
TGTCGTCATG 
TTAAACCCCA 
TTAAGCTTAC 
TGATTAAGCA 
CTTATCTCCA 
TGTAGCAAAA 
CTTGCAAGTT 
GGAGATATTT 
TGGCATTTCC 
AAACAAAACC 
AACCTCTGCT 
ACACAGGGCT 
TCACTGATGC 
AAATATATAA 
TAAATAAACT 
TATGTAGTGG 
GGGTGGGGGG 
AGTAAAAAAA 
GGATACATTC 
AT C AT AG AG T 
TAGGCTATAT 
GTTACTGTAG 
TGTTAAGTAG 
GTTGTGCTAC 
ATAATTTTAT 
ACCAAAACAT 
AGATGAAAGA 
TAGGTTACTT 
TTATAGTGTT 
TACATGTATT 
TATACGTTCT 
GCTCTACCAG 
GCCTCCCAGG 
ACTACAGGCA 
AGATGGAGTT 
AGTGATCTGC 
CCACTGTGCT 
ACTTTGTTGA 
TATTTAGAAT 
CTTCATAAGC 
TTGATTTAAA 
TAAGAACGTT 
AAAATATAAA 
GCATTCTACC 
CTGCAGGGAG 
AATTTCATTG 
CTATATATCT 
AAT AAAAT AG 
CCACCATCAT 
ATGGGCCAGA 
CATAACTACT 
GCCTAAGTGA 
AAAAAAAAAT 
ATGGGTGGGG 
CAGATCTGTC 
CCTTGCCACC 
TTGCCAGCTT 
TGCTTGCATC 
GTATACTAAG 
TGCAAAGCAA 
AGATTTAGCA 
ATAGCTAATA 
ACTTTTATCT 
TTGAGACAGA 
CTTGGCTCAC 
GCTGGGATTA 
TGTAGAGATG 
CCTCGAACTC 
CTGGGATTAC 
AC AGGGT AT C 
ATCACTGCAG 
TCCCAAGTAG 
AAAAAATTTT 
TGAACTCCTG 
GATTTCTTTG 



CCAAGCTGGT 
CCCAAAGTGC 
ACATATTTTT 
CCCAGCATCC 
TTCATTTCAT 
TTATTGACTT 
CCCTCATTGC 
CCTTGATATA 
ATATAGCCTG 
GCAGGATTAA 
TATCCTCTCC 
TCTTAATTTC 
TCAAGACCTA 
CCCTTCACTC 
AACTCATATA 
ACAATCATGG 
GAGCGTCTCA 
TTAATGAGGA 
TAATGCTACA 
AAT AT AC TC A 
ATGGATGTTT 
AAGAATCAAG 
AAAAAAAAGG 
CGAGAAATGT 
GAACTTACAC 
GACTAGCCTG 
CG AAT AT AC A 
TTGTGTATCT 
AATGTTACAA 
CCTTTTATGG 
CCTTATGTGG 
ATGAATATAC 
TTATTTATCT 
TACTATATAA 
ACCTAAATGA 
CTTTTCTTTT 
GCTGGAGTGC 
TTCAAACGAT 
CACACCACCA 
TTGCCATGTT 
CTGCCTCAGC 
CGGCCTAATC 
CAATATAAAA 
TATGAAAATA 
TCTTGCCTAT 
TAATAAGTAT 
CAACAGTTTT 
ATTTTCTGTA 
AAAATTTCTT 
AGGGGAGTTA 
GCTACCATTT 
ATTTTCTTTA 
CCACCATTCC 
ATTGCCTATT 
CAGTAAGTAT 
CATCTCTGCC 
TATAGTGTTG 
TTATTTGGTC 
CATGCACCAC 
CAACTCAATG 
TTTAATGGAA 
TCTCATATAG 
TGAAAATAAA 
AGTAAAGCAA 
CTAGTGGGTG 
CAGTATTTTG 
ATACCTTGTT 
AAAGTTTTGT 
ATCTCTCTCT 
TGCAACTTTA 
TAGGCGTGTG 
GAGTTTCGCC 
CTGTCCTCAA 
AGGTGTGAGC 
ATTCTGTTGC 
CCTTTTAACT 
CTAGGACCAC 
TTGTAGAGAT 
GACTCAATTG 
GGAGTACAGC 



CTCGAACTCC 
TGGGATTACA 
AACATAAAGG 
CCATTTCCGC 
CTGAAATTTG 
CAGAATATAA 
CCAGCCTGAT 
TGTGTAGCAT 
ATGGTATAAT 
TTCACAGTGA 
AAAAGCATAT 
ATGCAGAACA 
TTTTTGTTTG 
CATCTAAAAA 
GACTGAGTAC 
GCGTGCTATT 
TTAGGTCAAA 
CAGGGTGTGA 
TGGAAAAATA 
CACCATGGAA 
AATGGTGTGA 
TTTTAAGAAA 
TATGTACAGT 
GTCGATAGGT 
AAACCTAGAT 
TTGCTCCTAG 
AATACTTAAC 
AAACATATCT 
TGACTATGAC 
AACCACACTT 
CATATGACTG 
AT C AAAAT AT 
TAGTAATAAT 
AAGACACTGT 
TATAAATATA 
TTTTTTTTTT 
AGGGT GC AAT 
TCTCATGTCT 
TGCCCGGCTA 
GGCCAGGCTG 
CTCCCAAAGT 
TTACAAGTTT 
CATATTTGAG 
TCAATAGACC 
ATTGATTCGC 
GTATAAGAAA 
TAATTTGAAT 
GTTTAGCCAA 
AATAACAGTA 
GGCAGTTTAT 
ACGCTAAATT 
CATAAAAAAG 
AGAAGTTGTG 
AT AT AG ATT G 
TTCTGGCTTT 
ATTGTAGCTT 
AAATACAAGT 
TAAAAAAGAT 
TTGGTTAACT 
GTCTAACTCT 
AAACCTCTCC 
TTTTTTTGTG 
ATATACTAGT 
CTCAAGTTAT 
CTTGAGAGAC 
ATCTCGCTAG 
CCAAATACTG 
TTTGTTTTAT 
GTCACCCAGG 
AGCAATTCTC 
CCACCACGCC 
ATATTGGCCA 
GTGATCCACC 
CACCACACCC 
CCAGGCTTGA 
CCTGGGCTCA 
AGACACATGC 
GGGGTCTCGC 
ATCCTCCCAC 
ATGGTACAGC 



154 

TGGACTCACG CAATCCTGCT 38800 
GGCATGACCC ACCTCATCTG 38850 
CCGTATTTTA TATTTATCTC 38900 
CGAATCTGTT GCTTGCTAAT 38950 
ACAAACATCT TCTATTTCTT 39000 
AATAAAACAC TATACCCAAA 39050 
GTGAAAATAA TCAGCATACA 39100 
CTTTTAGATA AATATACAGC 39150 
ATCTTGCCCA TGTACCTCAT 39200 
TCAGATTTAC CTTTAAACTT 39250 
CTAAAACTTT TGTGTGTACT 39300 
GGCTCTTACC ACTGTTAGCT 39350 
TGGTTTCCTG ATGATGGTCA 39400 
TTGAGGTGAT ACAGGCTTTT 394 50 
AACTGCAATG CAGGCATGCT 39500 
GATATGTCTT AAGTTACAGA 39550 
ATGTAAACCA GTTTTTCTGC 39600 
GAGATTTCTT TAAGGAAAAC 39650 
TCTAACATTA GAGAATTAAG 39700 
TCTTGTGCAG ACATT AAAAT 39750 
GAAAAAGTTA GGATGTGCTG 39800 
ATACAGTATA CCCATACTTA 39850 
CATGTGTTGC TTAATGATGG 39900 
GATTTCATCC TTGTGTGAAC 39950 
GGTCTAGCCT ACTATGTATC 4 0000 
GCTACAAACC TGTAAAGCAT 40050 
ACAATGGCAA GCTATCATTG 40100 
AAAACATAGA AAACTAATGT 40150 
ATTGCTAGGC AATAGGAATT 40200 
ATATATGCGG TCCATGGTGG 40250 
TATACATGTA CACAAAAAAT 4 0300 
TTAAAATGGT TATAATGACT 4 0350 
AATGATGATA GATAATACTT 40400 
TATAAGTGTT CTACATACTT 40450 
ACTCTGACAG TAACTAATCT 40500 
CTTTTTTTAG ACAGAATCTT 40550 
CTCGGCTCAC TGCAACCTCC 4 0600 
CAGCCTCCTG AGTAGCTGGG 40650 
ATTTTTGTAT TTTTGGGTAG 40700 
ATCTTGAACT CCTGGCCTCA 40750 
GCTGGGATTA CAGGTGTGAA 4 0800 
TCAATATTTA AAGAGTGCTA 408 50 
AAAAAGAGAT ATAAGCATCT 4 0900 
TACAGCCGAC TAAAGCTTTT 40950 
TCCTGTGAAT ATGCATTAAT 41000 
TAACACTTTT CCTTAATTTT 41050 
TCCAATAGTG AAATACATAG 41100 
ATTGTTTTTG TTTCACCACA 41150 
AG AAAAT G AA TGCATACCTC 41200 
GGGCATAGTT ACAAGTGAGA 41250 
CATAAAAACT GCATTCAATT 41300 
GTTTCAATTA TTGGCCATTA 41350 
TCATGTTTAT CCTTTTTATA 41400 
TGTGTGTTCC ATTTTCTGTA 41450 
GGAGTCCATA TGGTCTCTAT 41500 
AAAGATTATC TAGGTCAAAT 41550 
TAT AT AAT AT AGGCTGCCAC 41600 
TTCATGACTT TTGTAGCAGC 41650 
CGGTGTATCT TTCTCCTTTG 41700 
AAAGATGGTG GAT G AT C AAA 417 50 
GGCCAGGAAG TTCACTGGGC 41800 
ATAAGAAATG CCAAAGTTGC 41850 
CCTGACACTG AATTTTTCAA 41900 
AGGAAAGGAA GCAGATACCT 41950 
ACTGGGACAC TGTCAGTGCT 42000 
GTAGAACACT GCTAATAATA 42050 
CTTAGCATTT TGCATGTTTT 42100 
TATTTATTTA TTTATTTATT 42150 
CTGGAGTGCC ATGGTGCGAT 42200 
CTGCCTCAGC TTCCTGAGTA 4 2250 
CAGCTACTTT CTATATTTTT 42300 
AGCTGGTCTC GAACTCCTGT 42350 
CGCCTCAGCC TCTCAAAGTG 42400 
AGCAGTGTTT TATTTTTGAG 42450 
GTGCAGTGGT GCAATCATAG 42500 
AGTCATCCTC CTGCTTAGCC 42550 
CATCACACTT GGCTATTTTT 42600 
TATGTTACCC AAACTGGTCC 42650 
CTTGGCCTTC CAGGTGCTGG 42700 
AGGAGATCAT TTGATGTTAC 42750 
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CTCTGTGCAG TGTTGCTAGT CAGCGAAAGA CTATAATACC TGTGGGGACA 42800 
GCGATTAGCC ACCACAACCA GTCTTTATTT AAAGTTATTA AAAMGGCTG 42850 
GGCGCAGTGG CTCACACCTG TAATCCTAGC ACTTTGGGAG GCCGAGGCAG 42900 
ATGGATCACC TGACGTGAGG AATTTGAGAC CAGCCTGGCC AACATGGTGA 42950 
AACCCCATCT CTACTAAAAA ATACAAAAAT TAGCTGGGTG TGGTCCTGTA 43000 
GTCCCAGCTA CTTGGGAGGC TGGGGCAGGA GAATTACTTG AACCCAGGAG 43050 
GCAGAGGTTG CAGTGAGCCG AGATTGTGCC ACTGCACTCC AGCCTGGGTG 4 3100 
ACAGAGAGAG ATTCCATCTC AAAAAAACAA GTTATTAAAA ATGTATATGA 4 3150 
ATGCTCCTAA TATGGTCAGG AAGCAAGGAA GCGAAGGATA TATTATGAGT 43200 
TTTAAGAAGG TGCTTAGCTG TATATTTATC TTTCAAAATG TATTAGAAGA 4 3250 
TTTTAGAATT CTTTCCTTCA TGTGCCATCT CTACAGGCAC CCATCAGAAA 43300 
AAGCATACTG CCGTTACCGT GAAACTGGTT GTAAAAGAGA AACTATCTAT 4 3350 
TTGCACCTTA AAAGACAGCT AGATTTTGCT GATTTTCTTC TTTCGGTTTT 43400 
CTTTGTCAGC AATAATATGT GAGAGGACAG ATTGTTAGAT ATGATAGTAT 43450 
AAAAAATGGT TAATGACAAT TCAGAGGCGA GGAGATTCTG TAAACTTAAA 43500 
ATTACTATAA ATGAAATTGA TTTGTCAAGA GGATAAATTT TAGAAAACAC 4 3550 
CCAATACCTT ATAACTGTCT GTTAATGCTT GCTTTTTCTC TACCTTTCTT 43600 
CCTTGTTTCA GTTGGGAAGC TTTTGGCTGC AAGTAACAGA AACTCCTAAT 4 3650 
TCAAATGGCT TAAGCAATAA GGAAATGTAT ATTCCCACAT AACTAGACGT 4 3700 
TCAAACAGGC CAGGCTCCAG CACTTCAGTA CGTCACCAGG GATCTGGGTT 4 3750 
CTTCCCAGCT CTCTGCTCTG CCATCTTTAG CGCTGGCTTC ATTCTCAGAC 4 3800 
TCTGGTAGCA TGATGGCTGT AGCTGTTTCA TGGGCCCCTT CAAACCTCAT 43850 
AGCAACCAGA GGAAGAAAAT GAGCCATTTT TTGAGTCTCC TTCATAGACT 43900 
TGAATAACTC TTTTTCAGAG CTTCTCACAG CAAACCTCTC CTCATGTCTC 4 3950 
CTCATGTCTT ATTGTTCAGA AATGGGTAAT GTGGCCATTT CACCAGTCAC 44000 
TGCCAACAAC AACGAGGTTC CTATAATTGT CTCTGAGTAA CCCTTTGGAA 44050 
^GGaSt G^TGGTCAGT CTACAAACTG AACACTGCAG TTCTGCGCTT 44100 
TTTACCAGTG AAAAAATGTA ATTATTTTCC CCTCTTAAGG ATTAATATTC 4 4150 
TTCAAATGTA TGCCTGTTAT GGATATAGTA TCTTTAAAAT TTTTTATTTT 44200 
AATAGCTTTA GGGGTACACA CTTTTTGCTT ACAGGGGTGA ATTGTGTAGT 44250 
GGTGAAGACT CGGCTTTTAA TGTACTTGTC ACCTGAGTGA TGTACATTGT 4 4300 
ACCCAATAGG TAATTTTTCA TCCATTACCC TCCTTCCGCC CTCTTCCCTT 4 4 350 
CTGAGTCTCC AACATCCCTT ATACCACTGT GTATGTTCTT GTGTACCTAC 44400 
AGCTAAGCTT CCACTTATAA GTGAGAACAT GCAGTATTTG GTTTTCCATT 4 4450 
CCTGAGTTAC TTCCCTTAGG ATAACAGCCC CCAGTTCCGT CCAAGTTGCT 4 4500 
rr AAAATACA TTATTCTTCT TTATGGCTGA GTAATAGTCC AT GGT AC AT A 4 4550 
TATACCACAT TTTCTTTATC C ACT TAT C AG TTGATGGACA CTTAGGTTAA 4 4 600 
TTCCATTCAA TTTCATTCAA TTTAAGTATA TTTGTAAGGA GCTAAAGCTG 44650 
AAAATTAAAT TTTAGATCTT TCAATACTCT TAAATTTTAT ATGTAAGTGG 44700 
TTTTTATATT TTCACATTTG AAATAAAGTA ATTTTTATAA CCTTGATATT 44750 
GTATGACTAT TCTTTTAGTA ATGTAAAGCC TACAGACTCC TACATTTGGA 4 4 800 
ACCACTAGTG TGTTGTTTCA CCCCTTGTTA TACTATCAGG ATCCTCGA 4 484 8 



INFORMATION FOR SEQ ID NO: 43: 
(i) SEQUENCE CHARACTERISTICS: 





(A) 


LENGTH : 


2396 








(B) 


TYPE: 


nucleic acid 






(C) 


STRANDEDNESS : double 








(D) 


TOPOLOGY : 


linear 






(xi) 


SEQUENCE DESCRIPTION: SEQ ID 


NO:43 




TTTCTAGTTG 


CTTTTAGCCA 


ATGTCGGATC 


AGGTTTTTCA 


AGCGACAAAG 


50 


AGATACTGAG 


ATCCTGGGCA 


GAGGACATCC 


TAGCTCGGTC 


AGATTTGGGC 


100 


AGGCTCAAGT 


GACCAGTGTC 


TTAAGGCAGA 


AGGGAGTCGG 


GGTAGGGTCT 


150 


GGCTGAACCC 


TCAACCGGGG 


CTTTTAACTC 


AGGGTCTAGT 


CCTGGCGCCA 


200 


AATGGATGGG 


ACCTAGAAAA 


GGT G AC AG AG 


TGCGCAGGAC 


ACCAGGAAGC 


250 


TGGTCCCACC 


CCTGCGCGGC 


TCCCGGGCGC 


TCCCTCCCCA 


GGCCTCCGAG 


300 


GATCTTGGAT 


TCTGGCCACC 


TCCGCACCCT 


TTGGATGGGT 


GTGGATGATT 


350 


TCAAAAGTGG 


ACGTGACCGC 


GGCGGAGGGG 


AAAGCCAGCA 


CGGAAATGAA 


400 


AGAGAGCGAG 


GAGGGGAGGG 


CGGGGAGGGG 


AGGGCGCTAG 


GGAGGGACTC 


450 


CCGGGAGGGG 


TGGGAGGGAT 


GGAGCGCTGT 


GGGAGGGTAC 


TGAGTCCTGG 


500 


CGCCAGAGGC 


GAAGCAGGAC 


CGGTTGCAGG 


GGGCTTGAGC 


CAGCGCGCCG 


550 


GCTGCCCCAG 


CTCTCCCGGC 


AGCGGGCGGT 


CCAGCCAGGT 


GGGATGCTGA 


600 


GGCTGCTGCT 


GCTGTGGCTC 


TGGGGGCCGC 


TCGGTGCCCT 


GGCCCAGGGC 


650 


GCCCCCGCGG 


GGACCGCGCC 


GACCGACGAC 


GT GGT AG ACT 


TGGAGTTTTA 


700 


CACCAAGCGG 


CCGCTCCGAA 


GCGTGAGTCC 


CTCGTTCCTG 


TCCATCACCA 


750 


TCGACGCCAG 


CCTGGCCACC 


GACCCGCGCT 


TCCTCACCTT 


CCTGGGCTCT 


800 


CCAAGGCTCC 


GTGCTCTGGC 


TAGAGGCTTA 


TCTCCTGCAT 


ACTTGAGATT 


850 


TGGCGGCACA 


AAGACTGACT 


TCCTTATTTT 


TGATCCGGAC 


AAGGAACCGA 


900 


CTTCCGAAGA 


AAGAAGTTAC 


TGGAAATCTC 


AAGTCAACCA 


TGATATTTGC 


950 
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AGGTCTGAGC CGGTCTCTGC TGCGGTGTTG AGGWU^TCC 
GCCCTTCCAG GAGCTGTTGC TGCTCCGAGA GCAGTACCAA AAGGAGTTCA 
AGAACAGCAC CTACTCAAGA AGCTCAGTGG ACATGCTCTA CAGTTTTGCC 
S^GCTCGG GGTTAGACCT GATCTTTGGT CTAAATGCGT TACTACGAAC 
CC C AG ACT T A CGGTGGAACA GcTCCAACGC CCAGCTTCTC CTTGACTACT 
GCT C TT C C AA GGGTTATAAC ATcTCCTGGG AACTGGGCAA TGAGCCCAAC 
AGTTTcTGGA AGAAAGCTCA CATTCTCATC GATGGGTTGC AGTTAGGAGA 
AGACTTTGTG GAGTTGCATA AACTTcTACA AAG = ^CC^ 
CAAAACTCTA TGGTCCTGAC ATCGGTCAGC CTCGAGCGAA <*^™ 
CTGCTGAGGA GTTTCCTGAA GGCTGGCGGA <™^«* ^SSSc 
ATGGCATCAC TATTACTTGA ATGGACGCAT CGCTACCAAA GAAGATTTTC 
TGAGCTCTGA TGCGCTGGAC ACTTTTATTC TCTCTGTGCA AAAAATTCTG 
SSS™ AAGAGATCAC ACCTGGCAAG AAGGTCTGGT TGGGAGAGAC 
GAGC TC AGC T TACGGTGGCG GTGCACCCTT GCTGTCCAAC ACCTTTGCAG 
GTGGCTGGAT AAATTGGGCC TGTCAGCCCA GATGGGCATA 
7^c7Z TGAGGCAGGT GTTCTTCGGA GCAGGCAACT ACCACTTAGT 
SSSL TTTGAGCCTT T^m CTGGCTCTCT 
AGAAACTGGT AGGTCCCAGG GTGTTACTGT CAAGAGTGAA AGGCCCAGAC 
AGG^cILc TCCGAGTGTA TCTCCACTGC ACTAACGTCT ATCACCCACG 

Ttatca^ GGAGATCTAA CTCTGTATGT cctgaacctc cataatgtca 

CCAAGCACTT GAAGGTACCG CCTCCGTTGT TCAGGAAACC AGTGGATACG 
S^SSSS AGCCTTCGGG GCCGGATGGA TTACTTTCCA AATCTGTCCA 

ac-aIcggt caaattctga agatggtgga tgagcagacc ct^^tt 

T G AC AG AAAA ACCTCTCCCC GCAGGAAGTG CACTAAGCCT GCCTGCCTTT 
Tc" TTTTTGTCAT ^GAAATGCC A^TCGCTG 
AAAATAAAAG GCATACGGTA CCCCTGAGAC 
TTCATAAAAC AAAACCCTAG TTTAGGAGGC CACCTCCT*G 
GAGCTTCGGG AGGGTGGGGT ACACTTCAGT ATT^TT^ GTGTGGTGTT 
CTCTCTAAGA AGAAT ACT GC AGGTGGTGAC AGTTAATAGC ACTGTG 

(2 ) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 535 

(B ) TYPE : amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



1000 

1050 

1100 

1150 

1200 

1250 

1300 

1350 

1400 

1450 

1500 

1550 

1600 

1650 

1700 

1750 

1800 

1850 

1900 

1950 

2000 

2050 

2100 

2150 

2200 

2250 

2300 

2350 

2396 



Met Leu Arg Leu Leu Leu Leu Trp Leu Trp Gly Pro Leu Gly Ala 

20 25 

ti= Thr lie Asp Ala Ser Leu Ala Thr Asp 
Pro Ser Phe Leu Ser He Thr lie Asp * 

Pro Arg Phe Leu Thr Phe Leu Gly Ser Pro Arg Leu Arg Ala Leu 



65 70 



fila Arg Gly Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly Thr Lys 

T hr Asp Phe Leu Z Phe Asp Pro Asp Lys Glu Pro Thr Ser Glu 

95 100 



Glu Arg Ser Tyr Trp Lys Ser Gin Val Asn His Asp He Cys Arg 

Ser Glu Pro Val III Ala Ala Val Leu Arg Lys Leu Gin Val Glu 

130 1 
Tr p Pro Phe Gin a« Leu Leu Leu Leu Arg Glu Gin Tyr Gin Lys 

145 

140 143 



157 

Glu Phe Lys Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Met Leu 

155 160 165 

Tyr Ser Phe Ala Lys Cys Ser Gly Leu Asp Leu He Phe Gly Leu 

170 "5 180 

Asn Ala Leu Leu Arg Thr Pro Asp Leu Arg Trp Asn Ser Ser Asn 

185 190 195 

Ala Gin Leu Leu Leu Asp Tyr Cys Ser Ser Lys Gly Tyr Asn He 

200 205 210 

Ser Trp Glu Leu Gly Asn Glu Pro Asn Ser Phe Trp Lys Lys Ala 

215 220 225 

His He Leu He Asp Gly Leu Gin Leu Gly Glu Asp Phe Val Glu 

240 



230 



235 



Leu His Lys Leu Leu Gin Arg Ser Ala Phe Gin Asn Ala Lys Leu 

255 



245 



250 



Tyr Gly Pro Asp He Gly Gin Pro Arg Gly Lys Thr Val Lys Leu 

270 

260 255 
Leu Arg Ser Phe Leu Lys Ala Gly Gly Glu Val He Asp Ser Leu 

275 280 285 

Thr Trp His His Tyr Tyr Leu Asn Gly Arg He Ala Thr Lys Glu 

290 2 9 5 300 

Asp Phe Leu Ser Ser Asp Ala Leu Asp Thr Phe lie Leu Ser Val 

305 310 315 

Gin Lys lie Leu Lys Val Thr Lys Glu He Thr Pro Gly Lys Lys 

320 325 330 

val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro 

335 340 345 

Leu Leu Ser Asn Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 

350 355 360 

Leu Gly Leu Ser Ala Gin Met Gly lie Glu Val Val Met Arg Gin 

375 



365 370 
Val Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe 

380 385 390 

Glu Pro Leu Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu 

395 400 405 

val Gly Pro Arg Val Leu Leu Ser Arg Val Lys Gly Pro Asp Arg 

410 415 4 20 

Ser Lys Leu Arg Val Tyr Leu His Cys Thr Asn Val Tyr His Pro 

425 430 435 

Arg Tyr Gin Glu Gly Asp Leu Thr Leu Tyr Val Leu Asn Leu His 

440 445 450 

Asn Val Thr Lys His Leu Lys Val Pro Pro Pro Leu Phe Arg Lys 

455 460 465 

Pro Val Asp Thr Tyr Leu Leu Lys Pro Ser Gly Pro Asp Gly Leu 

470 475 480 

Leu Ser Lys Ser Val Gin Leu Asn Gly Gin He Leu Lys Met Val 

485 490 495 

Asp Glu Gin Thr Leu Pro Ala Leu Thr Glu Lys Pro Leu Pro Ala 

500 505 51° 

Gly Ser Ala Leu Ser Leu Pro Ala Phe Ser Tyr Gly Phe Phe Val 

515 520 525 

He Arg Asn Ala Lys He Ala Ala Cys He 

530 535 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2396 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 



TT TCT AGT 8 

53 



TGC TTT TAG CCA ATG TCG GAT CAG GTT TTT CAA GCG ACA AAG AGA 
TAC TGA GAT CCT GGG CAG AGG ACA TCC TAG CTC GGT CAG ATT TGG 
GCA GGC TCA AGT GAC CAG TGT CTT AAG GCA GAA GGG AGT CGG GGT 
AG G GTC TGG CTG AAC CCT CAA CCG GGG CTT TTA ACT CAG GGT CTA 
GTC CTG GCG CCA AAT GGA TGG GAC CTA GAA AAG GTG ACA GAG TGC 
GCA GGA CAC CAG GAA GCT GGT CCC ACC CCT GCG CGG CTC CCG GGC 
GCT CCC TCC CCA GGC CTC CGA GGA TCT TGG ATT CTG GCC ACC TCC 
GCA CCC TTT GGA TGG GTG TGG ATG ATT TCA AAA GTG GAC GTG ACC 
GCG GCG GAG GGG AAA GCC AGC ACG GAA ATG AAA GAG AGC GAG GAG 
GGG AGG GCG GGG AGG GGA GGG CGC TAG GGA GGG ACT CCC GGG AGG 
GGT GGG AGG GAT GGA GCG CTG TGG GAG GGT ACT GAG TCC TGG CGC 
CAG AGG CGA AGC AGG ACC GGT TGC AGG GGG CTT GAG CCA GCG CGC 
CGG CTG CCC CAG CTC TCC CGG CAG CGG GCG GTC CAG CCA GGT GGG 
ATG CTG AGG CTG CTG CTG CTG TGG CTC TGG GGG CCG CTC GGT GCC 
Met Leu Arg Leu Leu Leu Leu Trp Leu Trp Gly Pro Leu Gly Ala 

5 10 15 

CTG GCC CAG GGC GCC CCC GCG GGG ACC GCG CCG ACC GAC GAC GTG 
Leu Ala Gin Gly Ala Pro Ala Gly Thr Ala Pro Thr Asp Asp Val 

20 25 

GTA GAC TTG GAG TTT TAC ACC AAG CGG CCG CTC CGA AGC GTG AGT 
val Asp Leu Glu Phe Tyr Thr Lys Arg Pro Leu Arg Ser Val Ser 

35 40 45 

CCC TCG TTC CTG TCC ATC ACC ATC GAC GCC AGC CTG GCC ACC GAC 
Pro Ser Phe Leu Ser He Thr He Asp Ala Ser Leu Ala Thr Asp 

50 55 60 

CCG CGC TTC CTC ACC TTC CTG GGC TCT CCA AGG CTC CGT GCT CTG 
Pro Arg Phe Leu Thr Phe Leu Gly Ser Pro Arg Leu Arg Ala Leu 



65 



70 



GCT AGA GGC TTA TCT CCT GCA TAC TTG AGA TTT GGC GGC ACA AAG 
Ala Arg Gly Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly Thr Lys 



80 85 



ACT GAC TTC CTT ATT TTT GAT CCG GAC AAG GAA CCG ACT TCC GAA 
Thr Asp Phe Leu He Phe Asp Pro Asp Lys Glu Pro Thr Ser Glu 



95 100 



GAA AGA AGT TAC TGG AAA TCT CAA GTC AAC CAT GAT ATT TGC AGG 
Glu Arg Ser Tyr Trp Lys Ser Gin Val Asn His Asp He Cys Arg 

no i 15 120 

TCT GAG CCG GTC TCT GCT GCG GTG TTG AGG AAA CTC CAG GTG GAA 
Ser Glu Pro Val Ser Ala Ala Val Leu Arg Lys Leu Gin Val Glu 



98 
143 
188 
233 
278 
323 
368 
413 
458 
503 
548 
593 
638 



125 



130 



135 



683 



728 



773 



818 



863 



908 



953 



998 



TGG CCC TTC CAG GAG CTG TTG CTG CTC CGA GAG CAG TAC CAA AAG 1043 
Trp Pro Phe Gin Glu Leu Leu Leu Leu Arg Glu Gin Tyr Gin Lys 

^ - - - 150 



140 145 



159 



GAG TTC AAG AAC AGC ACC TAC TCA AGA AGO TCA GTG GAC ATG CTC 1088 
Glu Phe Lys Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Met Leu 

155 160 165 

TAC AGT TTT GCC AAG TGC TCG GGG TTA GAC CTG ATC TTT GGT CTA 1133 
Tvr Ser Phe Ala Lys Cys Ser Gly Leu Asp Leu lie Phe Gly Leu 

170 1" 180 

AAT GCG TTA CTA CGA ACC CCA GAC TTA CGG TGG AAC AGc TCC AAC 1178 
Asn Ala Leu Leu Arg Thr Pro Asp Leu Arg Trp Asn Ser Ser Asn 

185 19° 195 

GCC CAG CTT CTC CTT GAC TAC TGC TCT TCC AAG GGT TAT AAC ATc 1223 
Ala Gin Leu Leu Leu Asp Tyr Cys Ser Ser Lys Gly Tyr Asn He 

200 205 210 

TCC TGG GAA CTG GGC AAT GAG CCC AAC AGT TTC TGG AAG AAA GCT 1268 
Ser Trp Glu Leu Gly Asn Glu Pro Asn Ser Phe Trp Lys Lys Ala 

215 220 225 

CAC ATT CTC ATC GAT GGG TTG CAG TTA GGA GAA GAC TTT GTG GAG 1313 
His He Leu He Asp Gly Leu Gin Leu Gly Glu Asp Phe Val Glu 

230 235 240 

TTG CAT AAA CTT CTA CAA AGG TCA GCT TTC CAA AAT GCA AAA CTC 1358 
Leu His Lys Leu Leu Gin Arg Ser Ala Phe Gin Asn Ala Lys Leu 

245 250 255 

TAT GGT CCT GAC ATC GGT CAG CCT CGA GGG AAG ACA GTT AAA CTG 1403 
Tyr Gly Pro Asp He Gly Gin Pro Arg Gly Lys Thr Val Lys Leu 

260 265 270 

CTG AGG AGT TTC CTG AAG GCT GGC GGA GAA GTG ATC GAC TCT CTT 1448 
Leu Arg Ser Phe Leu Lys Ala Gly Gly Glu Val He Asp Ser Leu 

275 280 285 

ACA TGG CAT CAC TAT TAC TTG AAT GGA CGC ATC GCT ACC AAA GAA 1493 
Thr Trp His His Tyr Tyr Leu Asn Gly Arg He Ala Thr Lys Glu 

290 295 300 

GAT TTT CTG AGC TCT GAT GCG CTG GAC ACT TTT ATT CTC TCT GTG 1538 
Asp Phe Leu Ser Ser Asp Ala Leu Asp Thr Phe He Leu Ser Val 

305 310 315 

CAA AAA ATT CTG AAG GTC ACT AAA GAG ATC ACA CCT GGC AAG AAG 1583 
Gin Lys lie Leu Lys Val Thr Lys Glu He Thr Pro Gly Lys Lys 

320 325 330 

GTC TGG TTG GGA GAG ACG AGC TCA GCT TAC GGT GGC GGT GCA CCC 1628 
Val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro 

335 340 345 

TTG CTG TCC AAC ACC TTT GCA GCT GGC TTT ATG TGG CTG GAT AAA 167 3 
Leu Leu Ser Asn Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 

350 355 360 
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TTG GGC CTG TCA GCC CAG ATG GGC ATA GAA GTC GTG ATG AGG CAG 1718 
Leu Gly Leu Ser Ala Gin Met Gly He Glu Val Val Met Arg Gin 

365 370 375 

GTG TTC TTC GGA GCA GGC AAC TAC CAC TTA GTG GAT GAA AAC TTT 1763 
Val Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe 

380 385 390 

GAG CCT TTA CCT GAT TAC TGG CTC TCT CTT CTG TTC AAG AAA CTG 1808 
Glu Pro Leu Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu 

395 ' 400 405 

GTA GGT CCC AGG GTG TTA CTG TCA AGA GTG AAA GGC CCA GAG AGG 1853 
Val Gly Pro Arg Val Leu Leu Ser Arg Val Lys Gly Pro Asp Arg 

420 



410 



415 



AGC AAA CTC CGA GTG TAT CTC CAC TGC ACT AAC GTC TAT CAC CCA 1898 
Ser Lys Leu Arg Val Tyr Leu His Cys Thr Asn Val Tyr His Pro 

425 430 435 

CGA TAT CAG GAA GGA GAT CTA ACT CTG TAT GTC CTG AAC CTC CAT 1943 
ftr g Tyr Gin Glu Gly Asp Leu Thr Leu Tyr Val Leu Asn Leu His 

440 445 450 

RAT GTC ACC AAG CAC TTG AAG GTA CCG CCT CCG TTG TTC AGG AAA 1988 
Asn Val Thr Lys His Leu Lys Val Pro Pro Pro Leu Phe Arg Lys 

455 460 465 

CCA GTG GAT ACG TAC CTT CTG AAG CCT TCG GGG CCG GAT GGA TTA 2033 
Pro Val Asp Thr Tyr Leu Leu Lys Pro Ser Gly Pro Asp Gly Leu 

470 475 480 

CTT TCC AAA TCT GTC CAA CTG AAC GGT CAA ATT CTG AAG ATG GTG 2078 
Leu Ser Lys Ser Val Gin Leu Asn Gly Gin He Leu Lys Met Val 

485 490 495 

GAT GAG CAG ACC CTG CCA GCT TTG AC A GAA AAA CCT CTC CCC GCA 2123 
ASP Glu Gin Thr Leu Pro Ala Leu Thr Glu Lys Pro Leu Pro Ala 
H 510 



500 



505 



GGA ACT GCA CTA AGC CTG CCT GCC TTT TCC TAT GGT TTT TTT GTC 2168 
Gly Ser Ala Leu Ser Leu Pro Ala Phe Ser Tyr Gly Phe Phe Val 



515 



520 525 



ATA AGA AAT GCC AAA ATC GCT GCT TGT ATA TGA AAA TAA AAG GCA 2213 
He Arg Asn Ala Lys He Ala Ala Cys He 

530 535 

TAC GGT ACC CCT GAG ACA AAA GCC GAG GGG GGT GTT ATT CAT AAA 2258 
ACA AAA CCC TAG TTT AGG AGG CCA CCT CCT TGC CGA GTT CCA GAG 2303 
CTT CGG GAG GGT GGG GTA CAC TTC AGT ATT ACA TTC AGT GTG GTG 2348 
TTC TCT CTA AGA AGA ATA CTG CAG GTG GTG ACA GTT AAT AGC ACT 2393 

23 9 6 

GTG 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 385 



161 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46 
CGGCCGCTGC TGCTGCTGTG GCTCTGGGGG CGGCTCCGTG CCCTGACCCA 
AGGCACTCCG GCGGGGACCG CGCCGACCAA AGACGTGGTG GACTTGGAGT 
TTTACACCAA GAGGCTATTC CAAAGCGTGA GTCCCTCGTT CCTGTCCATC 
ACCATCGACG CCAGTCTGGC CACCGACCCT CGGTTCCTCA CCTTCCTGAG 
CTCTCCACGG CTTCGAGCCC TGTCTAGAGG CTTATCTCCT GCGTACTTGA 
GATTTGGCGG CACCAAGACT GACTTCCTTA TTTTTGATCC CAACAACGAA 
CCCACCTCTG AAGAAAGAAG TTACTGGCAA TCTCAAGACA ACAATGATAT 
TTGCGGGTCT GACCGGGTCT CCGCTGACGT GTTGA 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 





(A) 


LENGTH : 


541 








(B) 


TYPE: 


nucleic acid 






(C) 


STRANDEDNESS : double 








(D) 


TOPOLOGY : 


linear 






(xi) SEQUENCE DESCRIPTION: SEQ ID 


NO: 47 




AAATCAGGAC 


ATATCCTTCA 


CTTATTTGCC 


TCTTGGTCAT 


ATTGGAGGCA 


50 


TTTGTATTCA 


TTTTTAATAA 


CCCTCAAAAT 


AGTGCATGCA 


AAGTGCTAAG 


100 


CGTCATTTGC 


CACATGGTGC 


CATTAACTGT 


CACCACCTGC 


AGTGGTCTAC 


150 


TTAGAGAACA 


CCGCACTGGA 


TGTTAACACT 


GAAGCGCGTG 


CCCCGCCCTC 


200 


CCGAGGCTCT 


GGATCCAGCG 


TTGAAGCTTG 


CCCCGCCCTC 


CCGAGGCTCT 


250 


GGATCCAGCA 


CTGGAGCATG 


CCCCGCCCTC 


CCGAGGCTCT 


GGAGCTTGCT 


300 


AAGGAGTCCG 


CTCCCTACCG 


CTGGGGTTTT 


GCTTTATTCT 


TATGAATGAC 


350 


ACCCCTGACC 


GCTTTCGTCT 


CAGGGGTACT 


GTAATGCCTT 


TTATTTTCAT 


400 


ATACAAGCTG 


CGATTTTGGC 


ATTTCTTATG 


ACAAAAAACC 


CATAGGAAAA 


450 


GGCGGGCACG 


CTTAGTGAGC 


TTCCTGCGGG 


GAGAGGTTTT 


T C T GT T AG AG 


500 


CTGGCANGGT 


CTGCTCATCG 


ACCATCTTCA 


GGCCTCGTGC 


C 


541 



